I'm working with Big Query's Hacker News dataset, and was looking at which urls have the most news stories. I'd also like to strip the domain names out, and see which of those have the most news stories. I'm working in R, and am having a bit of trouble getting the follow query to work.
sql_domain <- "SELECT url,
REPLACE(CASE WHEN REGEXP_CONTAINS(url, '//')
THEN url ELSE 'http://' + url END, '&', '?') AS domain_name,
COUNT(domain_name) as story_number
FROM `bigquery-public-data.hacker_news.full`
WHERE type = 'story'
GROUP BY domain_name
ORDER BY story_number DESC
LIMIT 10"
I've been getting the following error: "Error: No matching signature for operator + for argument types: STRING, STRING. Supported signatures: INT64 + INT64; FLOAT64 + FLOAT64; NUMERIC + NUMERIC"
Can't for the life of me figure out a replacement for the "+" operator. Your help is much appreciated!
Can't for the life of me figure out a replacement for the "+" operator
In BigQuery - instead of 'http://' + url you should use CONCAT('http://', url)
For your goals (top domains submitting to Hacker News):
#standardSQL
SELECT NET.REG_DOMAIN(url) domain, COUNT(*) c
, ROUND(EXP(AVG(LOG(IF(score<=0,0.1,score)))),2) avg_score
FROM `bigquery-public-data.hacker_news.full`
WHERE type = 'story'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 100
Note how much easier is to call NET.REG_DOMAIN() to get the domain.
Related
I've got a model method that conditionally concatenates the user's username ("login") and real name, if they've saved a real name - otherwise it just shows the username. I'd like to rewrite the query in ActiveRecord or Arel.
It looks like I should use an Arel::Nodes::NamedFunction. But i don't understand how to do the conditional concatenation with a named function. (Does Arel know about "if"? I can't find any reference in the docs.)
def primer_values
connection.select_values(%(
SELECT CONCAT(users.login,
IF(users.name = "", "", CONCAT(" <", users.name, ">")))
FROM users
ORDER BY IF(last_login > CURRENT_TIMESTAMP - INTERVAL 1 MONTH,
last_login, NULL) DESC,
contribution DESC
LIMIT 1000
)).uniq.sort
end
There's also similarly a conditional in ORDER BY.
While generally I abhor Raw SQL in rails given this usage I'd leave it as is. Although I might change it to something a bit more idiomatic like.
User
.order(
Arel.sql("IF(last_login > CURRENT_TIMESTAMP - INTERVAL 1 MONTH,last_login, NULL)").desc,
User.arel_table[:contribution].desc)
.limit(1000)
.pluck(Arel.sql(
'CONCAT(users.login,
IF(users.name = "", "",
CONCAT(" <", users.name, ">")))'))
.uniq.sort
Converting this to Arel without abstracting it into an object of its own will damage the readability significantly.
That being said just to give you an idea; the first part would be 3 NamedFunctions
CONCAT
IF
CONCAT
Arel::Nodes::NamedFuction.new(
"CONCAT",
[User.arel_table[:name],
Arel::Nodes::NamedFuction.new(
"IF",
[User.arel_table[:name].eq(''),
Arel.sql("''"),
Arel::Nodes::NamedFuction.new(
"CONCAT",
[Arel.sql("' <'"),
User.arel_table[:name],
Arel.sql("'>'")]
)]
)]
)
A NamedFunction is a constructor for FUNCTION_NAME(ARG1,ARG2,ARG3) so any SQL that uses this syntax can be created using NamedFunction including empty functions like NOW() or other syntaxes like LATERAL(query).
I'm using a search bar on my website based on keyup binding and ajax requests. It works fine but I would like my search engine to be able to have finner result with multi keywords management.
However I could not find any simple method to set up this kind of search method.
Does anyone knows how to set this up ?
Here is the actual SQL request that's being made:
if ($recherche !=""){
$req = $this->bdd->prepare("SELECT * FROM videos WHERE titre LIKE :recherche OR auteur LIKE :recherche UNION SELECT videos.id_video, videos.titre, videos.lien, videos.auteur, videos.date_upload FROM videos RIGHT JOIN mots_clefs ON videos.id_video = mots_clefs.id_video AND mots_clefs.mot_clef LIKE :recherche ORDER BY date_upload DESC LIMIT ".$start.", ".$limit);
$req->execute(array('recherche' => "%".$recherche."%"));
$result = json_encode($req->fetchAll(PDO::FETCH_ASSOC));
}
Request example:
SELECT * FROM videos WHERE titre LIKE '%word 1 word 2%' OR auteur LIKE '%word 1 word 2%' UNION SELECT videos.id_video, videos.titre, videos.lien, videos.auteur, videos.date_upload FROM videos RIGHT JOIN mots_clefs ON videos.id_video = mots_clefs.id_video AND mots_clefs.mot_clef LIKE '%word 1 word 2%' ORDER BY date_upload DESC LIMIT 0, 20);
You can execute the above query in a loop by passing one keyword at a time.
get the entire keyword list provided by the user into a sting.
Use string.Split() method by passing comma(,) as delimiter to get the list of the keywords into an array list.
loop through the array list and pass to the query.
Make sure you append the data fetched from the SQL into a data table or dataset and Not rewrite it.
string select = "SELECT * FROM [MyTable] WHERE [Title] LIKE '%" + strSearch.Replace(",", "%' OR [Title] LIKE '%") + "%'";
I am having a tons of URL's in my database and want to filter them by user-defined string in format something/*/something, where * stands for "anything". So when user defines checkout/*/complete, it means it filters out url's like:
http://my_url.com/checkout/15/complete
http://my_url.com/checkout/85/complete
http://my_url.com/checkout/something/complete
http://my_url.com/super/checkout/something/complete
etc.
How do I do that in SQL? Or should I filter out all the results and use PHP to do the job?
My SQL request now is
SELECT * FROM custom_logs WHERE pn='$webPage' AND id IN ( SELECT MAX(id) FROM custom_logs WHERE action_clicked_text LIKE '%{$text_value_active}%' GROUP BY token ) order by action_timestamp desc
This filters out all the log messages with user-defined text in column action_clicked_text, but uses LIKE statement, which will not work with * inside.
You want like. Either:
where url like '%checkout/%/complete%'
to get the urls that match he pattern. Or:
where url not like '%checkout/%/complete%'
to get the other urls.
I have a Oracle SQL Query as below :
select id from docs where CONTAINS (text,
'<query>
<textquery lang="ENGLISH" grammar="CONTEXT"> Informix
<progression>
<seq><rewrite>transform((TOKENS, "{", "}", "AND"))</rewrite></seq>
</progression>
</textquery>
</query>')>0;
The Query Works as expected. But I want to search for word Inform / Infor / Info. So I altered the query to below :
select id from docs where CONTAINS (text,
'<query>
<textquery lang="ENGLISH" grammar="CONTEXT"> Informix
<progression>
<seq><rewrite>transform((TOKENS, "?{", "}", "AND"))</rewrite></seq>
</progression>
</textquery>
</query>')>0;
By adding extra "?" in transform function. But this looks for informix / informi / inform / infor / info / inf / in. I want to restrict the search to a specific characters 4. Say till info. How can the same be achieved?
Thanks.
To find all documents that contain at least one occurrence of any of the terms between informix and info use the OR operator
and list all you allowerd terms in the template
<query>
<textquery lang="ENGLISH" grammar="CONTEXT"> informix informi inform infor info
<progression>
<seq><rewrite>transform((TOKENS, "{", "}", "OR"))</rewrite></seq>
</progression>
</textquery>
</query>
But the usage of template is not realy meaninfull here.
The same result you get with a direct query
select score(1), id from docs
where contains(text,'informix OR informi OR inform OR infor OR info',1) > 0
order by 1 desc;
The advantage of this case is that you can controll the score by prefering the documents with longer string with higher weights
select score(1), id from docs
where contains(text,'informix*5 OR informi*4 OR inform*3 OR infor*2 OR info',1) > 0
order by 1 desc;
Btw the ? (fuzzy) operator is used IMO to find misspelled words, not the exact prefixes of a term.
UPDATE
The concatenation of the prefixes you may assembly in PL/SQL or if necessary in SQL such as follows:
with txt as (
select 'informix' text from dual),
txt2 as (
select
substr(text,1,length(text) -rownum+1) text
from txt connect by level <= length(text) -3
)
select
LISTAGG( text, ', ') WITHIN GROUP (ORDER BY text desc)
from txt2
.
informix, informi, inform, infor, info
This is kind of complicated, so bear with me.
I've got the basic concept figured out thanks to THIS QUESTION
SELECT LENGTH(col) - LENGTH(REPLACE(col, 'Y', ''))
Very clever solution.
Problem is, I'm trying to count the number of instances of a string token, then take that counter and multiply it by a modifier that represents the string's numeric value. Oh, and I've got a list of 50-ish tokens, each with a different value.
So, take the string "{5}{X}{W}{b/r}{2/u}{pg}"
Looking up the list of tokens and their numeric value, we get this:
{5} 5
{X} 0
{W} 1
{b/r} 1
{2/u} 2
{pg} 1
.... ....
Therefore, the sum value of the string above is 5+0+1+1+2+1 = 10
Now, what I'm really looking for is to a way to do a Join and perform the aforementioned replace-token-get-length trick for each column of the TokenValue table.
Does that make sense?
Psuedo-SQL example:
SELECT StringColumn, TotalTokenValue
???
FROM TableWithString, TokenValueTable
Perhaps this would work better as a custom Function?
EDIT
I think I'm about halfway there, but it's kind of ugly.
SELECT StringColumn, LEN(StringColumn) AS TotalLen, Token,
{ fn LENGTH(Token) } AS TokenLength, TokenValue,
{ fn REPLACE(StringColumn, Token, '') AS Replaced,
{ fn LENGTH(Replaced) } AS RepLen,
{ TotalLen - RepLen / TokenLength} AS TokenCount },
{ TokenCount * TokenValue} CalculatedTokenValue
FROM StringTable CROSS JOIN
TokenTable
Then I need to wrap that in some kind of Group By and get SUM(CalculatedTokenValue)
I can picture it in my head, having trouble getting the SQL to work.
If you create a view like this one :
Create or replace view ColumnsTokens as
select StringColumn, Token, TokenValue,
(length(StringColumn) - length(replace(StringColumn, token, ''))) / length(Token) TokenCount
from StringTable
join TokenTable on replace(StringColumn, Token, '') <> StringColumn ;
that will act as a many-to-many relationship table between columns and tokens, I think you can easily write any query you need. For example,this will give you the total score :
select StringColumn, sum(TokenCount * TokenValue) TotalTokenScore
from ColumnsTokens
group by StringColumn ;
(You're only missing the StringColumns with no tokens)