If I have a book named "The Harold's Purple Crayon Collectors Set," I want the website URL to look like this:
www.site.com/book/harolds-purple-crayon/4324
I will need to write code to strip out things like noise words, special characters, words less than x chars long, limiting the final result to y words, etc, but after that code is written, what do I do with it?
Do I run each title through the code every time the URL is needed on my site, or instead, use the code to loop through all my titles and dump the results into a database and pull them from there instead of dynamically building them each time?
The best practice in this case is to save the friendly-URLs to the database and have them retrieved together with other information about that book (in this case). Next, all you have to do is to re-create the URL, using the string you generated and the ID (as per your example).
Related
I'm writing a Command for a Crystal Report that queries an SQL Database. The Command will use parameters/inputs that are generated from a different program. I've put parameters directly in Commands before, but this one has to be handled differently.
Said input will be a string that is numbers with an & in between such as this: "6&12&15", order is irrelevant in this case. For understanding purposes, we'll say that the numbers are product ID's and are unique. When a user wants to search for multiple products in this database, the string above will be how it looks.
I have used the following code in the past for non-number based strings and it works well because of how other fields are set up:
CASE WHEN '{?WearhouseState}' = '' THEN 1
WHEN CHARINDEX(Products.WearhouseState,'{?WearhouseState}',0)>0 THEN 1
ELSE 0
END = 1
That code will search for the field's value as a substring essentially anywhere in the given input parameter, which works for things like a state because "Texas" is never going to be a substring of any other state. However, this doesn't work so well with numbers. For example, if a product has an ID of 3, then the search will return that record if the parameter is '31', which I do not clearly want (it would also return product 1 as well).
For the mean time, I have been splitting the string up with a delimiter in Crystal Reports which works fine, but slows down the overall time to create the document. Most of the parameters I use I tend to put right in the query and it drastically improves the speed. The Crystal code is as follows:
{?ProductID}="" or {Command.ProductID} in split({?ProductID},"&")
This works exactly as intended but again, time is of the essence. Any additional information can be provided. It is technically InterSystems SQL so keep that in mind because I know the commands/clauses can vary between SQL.
I'd do the split string operation in SQL Server instead of CR. See e.g. T-SQL split string for a working code sample. Note that this logic does not need to run as a function, but you could also include it directly in your CR command.
I have a table named buildings
each building has zero - n images
I have two solutions
the first one (the classic solution) using two tables:
buildings(id, name, address)
building_images(id, building_id, image_url)
and the second solution using olny one table
buildings(id, name, address, image_urls_csv)
Given I won't need to search by image URL obviously,
I think the second solution (using image_urls_csv column) is easier to use, and no need to create another table just to keep the images, also I will avoid the hassle of multiple queries or joining.
the question is, if I don't really want to filter, search or group by the filed value, can I just make it CSV?
On the one hand, by simply having a column of image_urls_list avoids joins or multiple queries, yes. A single round-trip to the db is always a plus.
On the other hand, you then have a string of urls that you need to parse. What happens when a URL has a comma in it? Oh, I know, you quote it. But now you need a parser that is beyond a simple naive split on commas. And then, three months from now, someone will ask you which buildings share a given image, and you'll go through contortions to handle quotes, not-quotes, and entries that are at the beginning or end of the string (and thus don't have commas on either side). You'll start writing some SQL to handle all this and then say to heck with it all and push it up to your higher-level language to parse each entry and tell if a given image is in there, and find that this is slow, although you'll realise that you can at least look for %<url>% to limit it, ... and now you've spent more time trying to hack around your performance improvement of putting everything into a single entry than you saved by avoiding joins.
A year later, someone will give you a building with so many URLs that it overflows the text limit you put in for that field, breaking the whole thing. Or add some extra fields to each for extra metadata ("last updated", "expires", ...).
So, yes, you absolutely can put in a list of URLs here. And if this is postgres or any other db that has arrays as a first-class field type, that may be okay. But do yourself a favour, and keep them separate. It's a moderate amount of up-front pain, and the long-term gain is probably going to make you very happy you did.
Not
"Given I won't need to search by image URL obviously" is an assumption that you cannot make about a database. Even if you never do end up searching by url, you might add other attributes of building images, such as titles, alt tags, width, height, etc, so you would end up having to serialize all this data in that one column, and then you would not be able to index any of it. Plus, if you serialize it with one language, then you or whoever comes after you using a different language will either have to install some 3rd party library to deserialize your stuff or write their own deserialization function.
The only case that I can think of where you should keep serialized data in a database is when you inherit old software that you don't have time to fix yet.
you all know the like operator in sql. For example:
select *
from customer
where email like '%goog%'
So my question is how can the database return so fast a result?
When I should program a function like this, I would loop over all customers and over each email. But this is very slow. I heard about indexes. How can a database use a index when the database doesn't know which the first or last letter is? Or is their a other way to do it?
I don't want to program something like this. I only want to know how it works.
I have no idea what engine you are using and what's beneath its actual hood but here is some helpful information regarding this problem:
Often, SQL engines uses free text search inside the column to be able to extract queries like that extra fast. This is done by creating an inverted index, that maps from each word to the "documents" (row,column) that contains them. One widely used library is Apache Lucene. Unfortunately, most IR (Information Retrieval) libraries do NOT support wild card at the beginning of the query (but they do for anywhere else), so your specific example cannot be searched in such index.
You can create an index to support a wild card at the beginning of the index, by using a Suffix Tree. Suffix trees are excellent for searching a substring, like your example. However, they are not very optimized for searching a string with a wild card in the middle of it.
As I understand it this style of query is not very efficient - if there is a wild card that affects the start of words an entire scan is needed. However if the column is indexed the DBMS only has to bring the entire index into memory and scan it not the entire contents of the table - typically this would be a relatively fast task.
Since we don't know which RDBMS you're working with, let's look at one way that a database could benefit from an index in this sort of situation - and let's explore it via the book/index metaphor:
Imaging that each row of data takes up a page of a book, and on each page there'll be an email address. And at the end of the book, there's an index of email addresses - for each email address, it tells you which pages contain that email address. Each page of this index just contains email addresses and page numbers. Say that there's 50 email addresses per page.
If you want to find all the pages where the email address contain the letters goog, despite not knowing what the first or last letter of the email address is, do you think it will be easier for you to a) look through every page in the entire book, or b) scan down the email index at the back of the book taking note of which pages are useful (and then going to those pages if you need more information)?
Apologies, this is kind of a convoluted question. I have a SQL query in a ASP web-page, which is returning a dataset to a webgrid in the page. Looks like so:
Picture of Dataset/Webgrid output in ASP webpage here
I'd like to be able to take the "Community" column and keep the output the same, but make the output into a link to a software client based on the specific Community thats listed. We have a short list of them (maybe 4-5 total) so it'll mean only 4-5 different downloads.
Additionally, I may need to include a field for the OS as we have different downloads per OS (Mac / Windows). I assume if I can get the logic set for one, I can probably repeat that for the other column.
Any ideas on how I could approach this? I'm just not sure how to phrase this question appropriately, but I think this might make it more clear.
Thanks!
what you would need to do is something like
SELECT account, telephone, "<a href='"+communityURL+"'>"+community+"</a>" as CommunityCol, status
FROM myTable
ORDER BY account
... so, assuming the URL is described in communityURL the output you get in the CommunityCol column (from memory you might need to rename it) is a concatenated string containing what you need
AJAX autocomplete is fairly simple to implement. However, I wonder how to handle smart tag suggestion like this on SO.
To clarify the difference between autocomplete and suggestion:
autocomplete: foo [foobar, foobaz]
suggestion: foo [barfoo, foobar, foobaz], or even better, with 'did you mean' feature: [barfoo, foobar, foobaz, fobar, fobaz]
I suppose I need some full text search in tags (all letters indexed, not just words). There would be no problem to do it witch regex or other patterns for limited number of tags (even client side).
But how to implement this feature for big number of tags?
Is there any particular reason (besides URL) the tags on SO are dash separated? What about Unicode characters in tags?
I store the tags in the table with the following columns: id, tagname.
My SQL query returns objects with following fields: id, tagname, count
(I use Doctrine ORM and pgsql as default db driver.)
I would go with SELECTING them from database by REGEXP at every keypress. I did this on my sites and the was no prefrormance problem (I do not have heavy loaded server thought). If you do not like this idea, I would cash all 1-5 letters combinations which will users enter and refresh them on daily basis in separate table. If this table is indexed than you have very fast implementation.
To elaborate more on the second appreach:
Briefly: 1. Make a table SEARCHTABLE representing 1-n relationship betwean keywords (limit it to 3-4 letters) and primary IDs of tags. 2. INDEX on both fields. 3. Everytime the user makes a search do look at the SEARCHTABLE and if the combination is there, use that - very fast, as everything is indexed. If not do the regexp search and put all results to SEARCHTABLE.
Notes:
You should invalidate the table if
you add tags, but this should much
less often than a search. When
invalidating table you do not
necesarilly TRUNCATE it, you can
easily rebuild it taking all
keywords into account.
If you want to speed it up, you can "pregenerate" all two or even three
letters searches.
If you care enough, you should be using information from n-1 letter kewords to generate
the n letter keyword. It speeds the things tremendously. Imagine that user has typed "mo"
and you have shown them appropriate result from SEARCHTABLE. Than when she types "n"
giving it "mon" you need only serach trough already selected items to generate new
response.
Hope it is more comprehensive now.