Longest Common SubString, BigQuery, SQL - sql

Given I have two a Table with Two string columns:
A
B
John likes to go jumpping
Max likes swimming but he also likes to go jummping
John is cool
max is smart
John
max
In Big-query SQL How can I find the longest common substring? such that I get
A
B
C
John likes to go jumping
Max likes swimming but he also likes to go jumping
likes to go jumping
John is cool
max is smart
is
John
max
null

Try below very much SQL'ish approach
select A, B,
(
select string_agg(word, ' ' order by a_pos) phrase
from unnest(split(A, ' ')) word with offset a_pos
join unnest(split(B, ' ')) word with offset b_pos
using(word)
group by b_pos - a_pos
order by length(phrase) desc
limit 1
) as C
from `project.dataset.table`
when applied to sample data in your question - output is
Obviously your example is very simple, so in real use case you might need to adjust above to reflect reality
Also, note: there are many other options/approaches for your problem that SO has already multiple answers for, including mine - for text similarity mostly based on using JS UDF and levenshtein distance or similar algorithms

This probably is not a problem for your SQL to solve (it is though very simple to solve via any scripting language). However, BigQuery does support JS based UDFs, which usually come in handy to solve such problems.
Here is an option (which at its core is not SQL) that you can take in BigQuery:
CREATE TEMP FUNCTION lcsub(a string, b string)
RETURNS STRING
LANGUAGE js AS """
a = a.split(' ');
b = b.split(' ');
let la = a.length;
let lb = b.length;
let output = [];
for (var i=0; i<la; i++){
for (var j=0; j<lb; j++){
if (a[i] == b[j]){
let u = [b[j]]
let aidx = i;
for (var k = j+1; k<lb; k++){
u.push(b[k]);
if (u.join(' ') == a.slice(i, aidx +1+1).join(' ')){
if (u.length >= output.length){
output = u;
}
}
else {
u.pop();
if (u.length >= output.length){
output = u;
}
break;
}
aidx += 1;
if (aidx > la -1){
break
}
}
}
}
}
return output.join(' ')
""";
select A, B, lcsub(A, B) as C from dataset.table

Related

Add array of other records from the same table to each record

My project is a Latin language learning app. My DB has all the words I'm teaching, in the table 'words'. It has the lemma (the main form of the word), along with the definition and other information the user needs to learn.
I show one word at a time for them to guess/remember what it means. The correct word is shown along with some wrong words, like:
What does Romanus mean? Greek - /Roman/ - Phoenician - barbarian
What does domus mean? /house/ - horse - wall - senator
The wrong options are randomly drawn from the same table, and must be from the same part of speech (adjective, noun...) as the correct word; but I am only interested in their lemma. My return value looks like this (some properties omitted):
[
{ lemma: 'Romanus', definition: 'Roman', options: ['Greek', 'Phoenician', 'barbarian'] },
{ lemma: 'domus', definition: 'house', options: ['horse', 'wall', 'senator'] }
]
What I am looking for is a more efficient way of doing it than my current approach, which runs a new query for each word:
// All the necessary requires are here
class Word extends Model {
static async fetch() {
const words = await this.findAll({
limit: 10,
order: [Sequelize.literal('RANDOM()')],
attributes: ['lemma', 'definition'], // also a few other columns I need
});
const wordsWithOptions = await Promise.all(words.map(this.addOptions.bind(this)));
return wordsWithOptions;
}
static async addOptions(word) {
const options = await this.findAll({
order: [Sequelize.literal('RANDOM()')],
limit: 3,
attributes: ['lemma'],
where: {
partOfSpeech: word.dataValues.partOfSpeech,
lemma: { [Op.not]: word.dataValues.lemma },
},
});
return { ...word.dataValues, options: options.map((row) => row.dataValues.lemma) };
}
}
So, is there a way I can do this with raw SQL? How about Sequelize? One thing that still helps me is to give a name to what I'm trying to do, so that I can Google it.
EDIT: I have tried the following and at least got somewhere:
const words = await this.findAll({
limit: 10,
order: [Sequelize.literal('RANDOM()')],
attributes: {
include: [[sequelize.literal(`(
SELECT lemma FROM words AS options
WHERE "partOfSpeech" = "options"."partOfSpeech"
ORDER BY RANDOM() LIMIT 1
)`), 'options']],
},
});
Now, there are two problems with this. First, I only get one option, when I need three; but if the query has LIMIT 3, I get: SequelizeDatabaseError: more than one row returned by a subquery used as an expression.
The second error is that while the code above does return something, it always gives the same word as an option! I thought to remedy that with WHERE "partOfSpeech" = "options"."partOfSpeech", but then I get SequelizeDatabaseError: invalid reference to FROM-clause entry for table "words".
So, how do I tell PostgreSQL "for each row in the result, add a column with an array of three lemmas, WHERE existingRow.partOfSpeech = wordToGoInTheArray.partOfSpeech?"
Revised
Well that seems like a different question and perhaps should be posted that way, but...
The main technique remains the same. JOIN instead of sub-select. The difference being generating the list of lemmas for then piping then into the initial query. In a single this can get nasty.
As single statement (actually this turned out not to be too bad):
select w.lemma, w.defination, string_to_array(string_agg(o.defination,','), ',') as options
from words w
join lateral
(select defination
from words o
where o.part_of_speech = w.part_of_speech
and o.lemma != w.lemma
order by random()
limit 3
) o on 1=1
where w.lemma in( select lemma
from words
order by random()
limit 4 --<<< replace with parameter
)
group by w.lemma, w.defination;
The other approach build a small SQL function to randomly select a specified number of lemmas. This selection is the piped into the (renamed) function previous fiddle.
create or replace
function exam_lemma_definition_options(lemma_array_in text[])
returns table (lemma text
,definition text
,option text[]
)
language sql strict
as $$
select w.lemma, w.definition, string_to_array(string_agg(o.definition,','), ',') as options
from words w
join lateral
(select definition
from words o
where o.part_of_speech = w.part_of_speech
and o.lemma != w.lemma
order by random()
limit 3
) o on 1=1
where w.lemma = any(lemma_array_in)
group by w.lemma, w.definition;
$$;
create or replace
function exam_lemmas(num_of_lemmas integer)
returns text[]
language sql
strict
as $$
select string_to_array(string_agg(lemma,','),',')
from (select lemma
from words
order by random()
limit num_of_lemmas
) ll
$$;
Using this approach your calling code reduces to a needs a single SQL statement:
select *
from exam_lemma_definition_options(exam_lemmas(4))
order by lemma;
This permits you to specify the numbers of lemmas to select (in this case 4) limited only by the number of rows in Words table. See revised fiddle.
Original
Instead of using a sub-select to get the option words just JOIN.
select w.lemma, w.definition, string_to_array(string_agg(o.definition,','), ',') as options
from words w
join lateral
(select definition
from words o
where o.part_of_speech = w.part_of_speech
and o.lemma != w.lemma
order by random()
limit 3
) o on 1=1
where w.lemma = any(array['Romanus', 'domus'])
group by w.lemma, w.definition;
See fiddle. Obviously this will not necessary produce the same options as your questions provides due to random() selection. But it will get matching parts of speech. I will leave translation to your source language to you; or you can use the function option and reduce your SQL to a simple "select *".

Remove overlapping substrings within a BigQuery STRING field

I'm trying to find the most efficient way to remove overlapping substrings from a string field value on BigQuery. My use case is the same as Combining multiple regex substitutions but within BigQuery.
If I sum up the post above:
With the following list of substrings: ["quick brown fox", "fox jumps"]
I want:
A quick brown fox jumps over the lazy dog to be replaced by A over the lazy dog.
My thoughts were to come up with a JS UDF that does a similar job than what's mentioned in the post above i.e. to create a mask of the whole string and loop over the substrings to identify which characters to remove... But do you have better ideas?
Thanks for your help
I couldn't find out how to do this in Standard SQL
Below is for BigQuery Standard SQL and does whole thing in one shot - just one [simple] query
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'A quick brown fox jumps over the lazy dog' text
), list AS (
SELECT ['quick brown fox', 'fox jumps'] phrases
)
SELECT text AS original_text, REGEXP_REPLACE(text, STRING_AGG(pattern, '|'), '') processed_text FROM (
SELECT DISTINCT text, SUBSTR(text, MIN(start), MAX(finish) - MIN(start) + 1) pattern FROM (
SELECT *, COUNTIF(flag) OVER(PARTITION BY text ORDER BY start) grp FROM (
SELECT *, start > LAG(finish) OVER(PARTITION BY text ORDER BY start) flag FROM (
SELECT *, start + phrase_len - 1 AS finish FROM (
SELECT *, LENGTH(cut) + 1 + OFFSET * phrase_len + IFNULL(SUM(LENGTH(cut)) OVER(win), 0) start
FROM `project.dataset.table`, list,
UNNEST(phrases) phrase,
UNNEST([LENGTH(phrase)]) phrase_len,
UNNEST(REGEXP_EXTRACT_ALL(text, r'(.+?)' || phrase)) cut WITH OFFSET
WINDOW win AS (PARTITION BY text, phrase ORDER BY OFFSET ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)))) GROUP BY text, grp
) GROUP BY text
with output
Row original_text processed_text
1 A quick brown fox jumps over the lazy dog A over the lazy dog
I tested above with few more complex / tricky texts and it still worked
Brief explanation:
gather all inclusions of phrases in list and their respective starts and ends
combine overlapping fragments and calculate their respective starts and ends
extract new fragments based on starts and end from above step 2
order DESC them by length and generate regexp expression
finally do REGEXP_REPLACE using regexp generated in above step 4
Above might look messy - but in reality it does all above in one query and in pure SQL
Using a custom JS UDF seems to work, but i've seen faster BigQuery..!
CREATE FUNCTION `myproject.mydataset.keyword_remover_js`(label STRING) RETURNS STRING LANGUAGE js AS """
var keywords = ["a quick brown fox", "fox jumps"] ;
var mask = new Array(label.length).fill(1);
var reg = new RegExp("(" + keywords.join("|") + ")", 'g');
var found;
while (found = reg.exec(label)) {
for (var i = found.index; i < reg.lastIndex; i++) {
mask[i] = 0;
}
reg.lastIndex = found.index+1;
}
var result = []
for (var i = 0; i < label.length; i++) {
if (mask[i]) {
result.push(label[i])
}
}
return result.join('').replace(/ +/g,' ').replace(/^ +| +$/,'')
""";

remove duplicates from almost similar records in SQL

I have a table with three columns: Name, Address, City. This table is around a million records long. The name and address fields can probably have duplicates.
An example of duplicate names are:
XYZ foundation Coorporation
XYZ foundation Corp
XYZ foundation Co-orporation
Or another example
XYZ Center
XYZ Ctr
An example of duplication in addresses would be
60909 East 34TH STREET BAY #1
60909 East 34TH ST. BAY #1
60909 East 34TH ST. BAY 1
As you can see, the name and address fields are duplicates, but only to the human eye, because we understand abbreviations and short forms. How do I build this into a select statement in SQL Server? If not SQL Server, is there another way to scan and remove such duplicates?
The approach that I used is better suited for surnames, but I used it for company names as well. Most likely it will not work well for addresses.
Stage 1
Add a column to the table that stores a "normalized" company name. In my case I've written a function that populates the column via a trigger. The function has a set of rules, like this:
adds one space in the front and one in the back
replaces single char symbols ~`!##$%^&*()=_+[]{}|;':",.<>? with space (all except / -)
replaces multi-char tokens with space: T/A C/- P/L
replaces single char symbols -/ with space
replaces multi-char tokens with space: PTY PTE INC INCORPORATED LTD LIMITED CO COMPANY MR DR THE AND 'TRADING AS' 'TRADE AS' 'OPERATING AS'
replaces CORPORATION with CORP
trim all leading and trailing spaces
replace multiple consecutive spaces with single space
Note: when dealing with multi-char tokens surround them with spaces
I looked through my data and made these rules up. Adjust them for you case.
Stage 2
I used the so-called Jaro-Winkler metric to calculate the distance between two normalized company names. I implemented the function that calculates this metric in CLR.
In my case my goal was to check for duplicates as a new entry is added to the system. The user enters the company name, program normalizes it and calculates the Jaro-Winkler distance between the given name and all existing names. The closer the distance to 1, the closer the match. The user saw existing records ordered by the relevance and could decide whether the company name that he just entered already exists in the database, or he still wanted to create a new one.
There exist other metrics that try to perform fuzzy search, like Levenshtein distance. Most likely, you'll have to use different metrics for names and addresses, because the types of mistakes are significantly different for them.
SQL Server has built-in functions to do fuzzy search, but I didn't use them and I'm not sure if they are available in standard editions or only enterprise, e.g. CONTAINSTABLE
Returns a table of zero, one, or more rows for those columns
containing precise or fuzzy (less precise) matches to single words and
phrases, the proximity of words within a certain distance of one
another, or weighted matches.
Note
When I was looking into this topic I came to the conclusion that all these metrics (Jaro-Winkler, Levenstein, etc.) look for simple mistypes, like a missed/extra letter or two letters swapped. In my and your cases this approach as-is would perform poorly, because you effectively have a dictionary of contractions first and then on top of that there can be simple mistypes. That's why I ended up doing it in two stages - normalization and then applying the fuzzy search metric.
To make a list of rules that I mentioned above I made a dictionary of all words that appear in my data. Essentially, take each Name and split it into multiple rows by space. Then group by found tokens and count how many times they appear. Manually look through the list of tokens. This list should not be too long when you remove rare tokens from it. Hopefully common words and contractions would be easy to spot. I would imagine that the word Corporation and "Corp" would appear many times, as opposed to the actual company name XYZ. Those odd mistypes like "Coorporation" should be picked up by the fuzzy metric later.
In a similar way make a separate dictionary for Addresses, where you would see that Street and St. appear many times. For addresses you can "cheat" and get a list of common words from the index of some city map (street/st, road/rd, highway/hwy, grove/gv, etc.)
This is my implementation of the Jaro-Winkler metric:
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
public partial class UserDefinedFunctions
{
/*
The Winkler modification will not be applied unless the percent match
was at or above the WeightThreshold percent without the modification.
Winkler's paper used a default value of 0.7
*/
private static readonly double m_dWeightThreshold = 0.7;
/*
Size of the prefix to be concidered by the Winkler modification.
Winkler's paper used a default value of 4
*/
private static readonly int m_iNumChars = 4;
[Microsoft.SqlServer.Server.SqlFunction(DataAccess = DataAccessKind.None, SystemDataAccess = SystemDataAccessKind.None, IsDeterministic = true, IsPrecise = true)]
public static SqlDouble StringSimilarityJaroWinkler(SqlString string1, SqlString string2)
{
if (string1.IsNull || string2.IsNull)
{
return 0.0;
}
return GetStringSimilarityJaroWinkler(string1.Value, string2.Value);
}
private static double GetStringSimilarityJaroWinkler(string string1, string string2)
{
int iLen1 = string1.Length;
int iLen2 = string2.Length;
if (iLen1 == 0)
{
return iLen2 == 0 ? 1.0 : 0.0;
}
int iSearchRange = Math.Max(0, Math.Max(iLen1, iLen2) / 2 - 1);
bool[] Matched1 = new bool[iLen1];
for (int i = 0; i < Matched1.Length; ++i)
{
Matched1[i] = false;
}
bool[] Matched2 = new bool[iLen2];
for (int i = 0; i < Matched2.Length; ++i)
{
Matched2[i] = false;
}
int iNumCommon = 0;
for (int i = 0; i < iLen1; ++i)
{
int iStart = Math.Max(0, i - iSearchRange);
int iEnd = Math.Min(i + iSearchRange + 1, iLen2);
for (int j = iStart; j < iEnd; ++j)
{
if (Matched2[j]) continue;
if (string1[i] != string2[j]) continue;
Matched1[i] = true;
Matched2[j] = true;
++iNumCommon;
break;
}
}
if (iNumCommon == 0) return 0.0;
int iNumHalfTransposed = 0;
int k = 0;
for (int i = 0; i < iLen1; ++i)
{
if (!Matched1[i]) continue;
while (!Matched2[k])
{
++k;
}
if (string1[i] != string2[k])
{
++iNumHalfTransposed;
}
++k;
// even though length of Matched1 and Matched2 can be different,
// number of elements with true flag is the same in both arrays
// so, k will never go outside the array boundary
}
int iNumTransposed = iNumHalfTransposed / 2;
double dWeight =
(
(double)iNumCommon / (double)iLen1 +
(double)iNumCommon / (double)iLen2 +
(double)(iNumCommon - iNumTransposed) / (double)iNumCommon
) / 3.0;
if (dWeight > m_dWeightThreshold)
{
int iComparisonLength = Math.Min(m_iNumChars, Math.Min(iLen1, iLen2));
int iCommonChars = 0;
while (iCommonChars < iComparisonLength && string1[iCommonChars] == string2[iCommonChars])
{
++iCommonChars;
}
dWeight = dWeight + 0.1 * iCommonChars * (1.0 - dWeight);
}
return dWeight;
}
};
You could look for a more customized solution, together with the DIFFERENCE function for instance. (see: DIFFERENCE function, SQL Server)
Is it possible for Name and City to be logically similar yet different, as well?
Since there's a lot of room for variations here and only you have access to the real data, only you can check what works and what kind of exceptions you basically have there.
But hopefully this will get you started.
-- Creating the test set
DECLARE #TESTTABLE TABLE (Name VARCHAR(256), City VARCHAR(256), Address VARCHAR(256))
INSERT INTO #TESTTABLE VALUES ('Billy bob' ,'New York' ,'Baker street 125')
INSERT INTO #TESTTABLE VALUES ('Billy bob' ,'New York' ,'Baker street 120')
INSERT INTO #TESTTABLE VALUES ('Billy bob' ,'New York' ,'Baker st 125')
INSERT INTO #TESTTABLE VALUES ('Billy bob' ,'New York' ,'Mallroad 1')
INSERT INTO #TESTTABLE VALUES ('James Dean' ,'Washington DC' ,'Primadonna road 15 c 100')
INSERT INTO #TESTTABLE VALUES ('James Dean' ,'Washington DC' ,'Primadonna r 15')
INSERT INTO #TESTTABLE VALUES ('Got Nuttin' ,'Philly' ,'Mystreet 1500') -- Doesn't show, since no real duplicates
And then, after the test data, the actual query.
-- The query
;WITH CTE AS
(SELECT DISTINCT SRC.RN, T1.*, DIFFERENCE(T1.Address, T2.Address) DIFF_FACTOR
FROM #TESTTABLE T1
JOIN #TESTTABLE T2 ON T1.Name = T1.Name AND T2.City = T1.City AND T1.Address <> T2.Address
JOIN (SELECT DENSE_RANK() OVER (ORDER BY Name, City) RN, Name, City FROM #TESTTABLE T3 GROUP BY Name, City HAVING COUNT(*) > 1) SRC
ON SRC.City = T1.City AND SRC.Name = T1.Name)
SELECT DISTINCT RN, Name, City, COUNT(DISTINCT C.Address) Address_CT
, STUFF((SELECT ','+B.Address
FROM CTE B
WHERE B.RN = C.RN AND B.DIFF_FACTOR = C.DIFF_FACTOR
ORDER BY B.Address ASC
FOR XML PATH('')),1,1,'') AllAdresses
, DIFF_FACTOR
FROM CTE C
WHERE DIFF_FACTOR > 1 -- Comment this row to see that 'Mallroad 1' was considered to be too different from the rest, so this filter prevents us from considering that in the result set
GROUP BY RN, Name, City, DIFF_FACTOR
ORDER BY RN ASC, DIFF_FACTOR DESC
That is probably not the most effective - or accurrate - way to go about doing this, but it's a good place to start and to show what can be done. If there's a chance for Name and City to also be different but duplicates to human eyes, you could modify the query to match any two identical column values, comparing the third. But it gets really difficult to automate comparisons in cases where you have one identifying column, and both of the others can be different from one another to varying degrees.
I suspect you need to make several queries to first sort out the biggest mess, and eventually find the last most evasive "duplicates" by hand, a few at a time.

Berkeley DB equivalent of SELECT COUNT(*) All, SELECT COUNT(*) WHERE LIKE "%...%"

I'm looking for Berkeley DB equivalent of
SELECT COUNT All, SELECT COUNT WHERE LIKE "%...%"
I have got 100 records with keys: 1, 2, 3, ... 100.
I have got the following code:
//Key = 1
i=1;
strcpy_s(buf, to_string(i).size()+1, to_string(i).c_str());
key.data = buf;
key.size = to_string(i).size()+1;
key.flags = 0;
data.data = rbuf;
data.size = sizeof(rbuf)+1;
data.flags = 0;
//Cursor
if ((ret = dbp->cursor(dbp, NULL, &dbcp, 0)) != 0) {
dbp->err(dbp, ret, "DB->cursor");
goto err1;
}
//Get
dbcp->get(dbcp, &key, &data_read, DB_SET_RANGE);
db_recno_t cnt;
dbcp->count(dbcp, &cnt, 0);
cout <<"count: "<<cnt<<endl;
Count cnt is always 1 but I expect it calculates all the partial key matches for Key=1: 1, 10, 11, 21, ... 91.
What is wrong in my code/understanding of DB_SET_RANGE ?
Is it possible to get SELECT COUNT WHERE LIKE "%...%" in BDB ?
Also is it possible to get SELECT COUNT All records from the file ?
Thanks
You're expecting Berkeley DB to be way more high-level than it actually is. It doesn't contain anything like what you're asking for. If you want the equivalent of WHERE field LIKE '%1%' you have to make a cursor, read through all the values in the DB, and do the string comparison yourself to pick out the ones that match. That's what an SQL engine actually does to implement your query, and if you're using libdb instead of an SQL engine, it's up to you. If you want it done faster, you can use a secondary index (much like you can create additional indexes for a table in SQL), but you have to provide some code that links the secondary index to the main DB.
DB_SET_RANGE is useful to optimize a very specific case: you're looking for items whose key starts with a specific substring. You can DB_SET_RANGE to find the first matching key, then DB_NEXT your way through the matches, and stop when you get a key that doesn't match. This works only on DB_BTREE databases because it depends on the keys being returned in lexical order.
The count method tells you how many exact duplicate keys there are for the item at the current cursor position.
You can use method DB->stat().
For example, number of unique keys in the BT_TREE.
bool row_amount(DB *db, size_t &amount) {
amount = 0;
if (db==NULL) return false;
DB_BTREE_STAT *sp;
int ret = db->stat(db, NULL, &sp, 0);
if(ret!=0) return false;
amount = (size_t)sp->bt_nkeys;
return true;
}

SELECT MAX query returns only 1 variable + codeigniter

I use codeigniter and have an issue about SELECT MAX ... I couldnot find any solution at google search...
it looks like it returns only id :/ it's giving error for other columns of table :/
Appreciate helps, thanks!
Model:
function get_default()
{
$this->db->select_max('id');
$query = $this->db->getwhere('gallery', array('cat' => "1"));
if($query->num_rows() > 0) {
return $query->row_array(); //return the row as an associative array
}
}
Controller:
$default_img = $this->blabla_model->get_default();
$data['default_id'] = $default_img['id']; // it returns this
$data['default_name'] = $default_img['gname']; // it gives error for gname although it is at table
To achieve your goal, your desire SQL can look something like:
SELECT *
FROM gallery
WHERE cat = '1'
ORDER BY id
LIMIT 1
And to utilise CodeIgniter database class:
$this->db->select('*');
$this->db->where('cat', '1');
$this->db->order_by('id', 'DESC');
$this->db->limit(1);
$query = $this->db->get('gallery');
That is correct: select_max returns only the value, and no other column. From the specs:
$this->db->select_max('age');
$query = $this->db->get('members');
// Produces: SELECT MAX(age) as age FROM members
You may want to read the value first, and run another query.
For an id, you can also use $id = $this->db->insert_id();
See also: http://www.hostfree.com/user_guide/database/active_record.html#select
CodeIgniter will select * if nothing else is selected. By setting select_max() you are populating the select property and therefore saying you ONLY want that value.
To solve this, just combine select_max() and select():
$this->db->select('somefield, another_field');
$this->db->select_max('age');
or even:
$this->db->select('sometable.*', FALSE);
$this->db->select_max('age');
Should do the trick.
It should be noted that you may of course also utilize your own "custom" sql statements in CodeIgniter, you're not limited to the active record sql functions you've outlined thus far. Another active record function that CodeIgniter provides is $this->db->query(); Which allows you to submit your own SQL queries (including variables) like so:
function foo_bar()
{
$cat = 1;
$limit = 1;
$sql = "
SELECT *
FROM gallery
WHERE cat = $cat
ORDER BY id
LIMIT $limit
";
$data['query'] = $this->db->query($sql);
return $data['query'];
}
Recently I have been utilizing this quite a bit as I've been doing some queries that are difficult (if not annoying or impossible) to pull off with CI's explicit active record functions.
I realize you may know this already, just thought it would help to include for posterity.
2 helpful links are:
http://codeigniter.com/user_guide/database/results.html
http://codeigniter.com/user_guide/database/examples.html