I want to get a total amount of all rows in multiple files, which are saved as a QVD. Actually, with one file I would accomplish this like that:
data:
LOAD count(id) AS counter FROM data.qvd (qvd);
LET number = Peek('counter');
Of course, I know that I also can use RowNo() or Count() the whole table in one command, but I want to try this with that solution.
Now when I try to fetch multiple files in one statement, as shown below, I always get only the count of the last loaded file and not the total:
data_multiple:
LOAD count(id) AS counter FROM data_*.qvd (qvd);
LET number_multiple = Peek('counter');
Now my question is how do I get the full amount of rows and not only the last one.
What I tried so far
I already tried to rearrange the statement like this:
data:
LOAD id FROM data_*.qvd (qvd);
LOAD Count(id) AS counter Resident data;
LET number = Peek('counter');
But I do get still the same result. Is there some way how to achieve this?
I have asked the same question on the official Qlik Community page. There I received an answer:
let total_number = 0;
for each file in filelist('D:\Data\data_*.qvd')
QVDRecords: load QvdNoOfRecords('$(file)') as Counter, '$(file)' as Source autogenerate 1;
total_number = total_number + Peek('Counter');
next
trace QVD: $(total_number);
Related
I am running the line
LIST #my_stage;
to get a list of all of the files that have been staged. Is there a way that I can get the most recent file that has been staged? The output shows the files in ascending date order so I would like to grab the last file in this list. Is there a way I can do this? (or something similar)
Just use RESULT_SCAN to process the resultset of the list command
https://docs.snowflake.com/en/sql-reference/functions/result_scan.html
Nick is correct but sometimes working examples help:
list #CITIBIKE_TRIPS;
select count(*) as file_count
from table(result_scan(LAST_QUERY_ID()));
FILE_COUNT
4,227
now to be honest, I was fiddling around trying to sum the file size etc etc, so swapped to this code
list #CITIBIKE_TRIPS;
set id = (select LAST_QUERY_ID());
select *
from table(result_scan($id));
select count(*) as file_count
from table(result_scan($id));
while playing so I could keep referring to the same query..
I have a query, which selects documents to be removed. Right now, I remove them manually, like this (using python):
for id in mycoll.find(query, fields={}):
mycoll.remove(id)
This does not seem to be very efficient. Is there a better way?
EDIT
OK, I owe an apology for forgetting to mention the query details, because it matters. Here is the complete python code:
def reduce_duplicates(mydb, max_group_size):
# 1. Count the group sizes
res = mydb.static.map_reduce(jstrMeasureGroupMap, jstrMeasureGroupReduce, 'filter_scratch', full_response = True)
# 2. For each entry from the filter scratch collection having count > max_group_size
deleteFindArgs = {'fields': {}, 'sort': [('test_date', ASCENDING)]}
for entry in mydb.filter_scratch.find({'value': {'$gt': max_group_size}}):
key = entry['_id']
group_size = int(entry['value'])
# 2b. query the original collection by the entry key, order it by test_date ascending, limit to the group size minus max_group_size.
for id in mydb.static.find(key, limit = group_size - max_group_size, **deleteFindArgs):
mydb.static.remove(id)
return res['counts']['input']
So, what does it do? It reduces the number of duplicate keys to at most max_group_size per key value, leaving only the newest records. It works like this:
MR the data to (key, count) pairs.
Iterate over all the pairs with count > max_group_size
Query the data by key, while sorting it ascending by the timestamp (the oldest first) and limiting the result to the count - max_group_size oldest records
Delete each and every found record.
As you can see, this accomplishes the task of reducing the duplicates to at most N newest records. So, the last two steps are foreach-found-remove and this is the important detail of my question, that changes everything and I had to be more specific about it - sorry.
Now, about the collection remove command. It does accept query, but mine include sorting and limiting. Can I do it with remove? Well, I have tried:
mydb.static.find(key, limit = group_size - max_group_size, sort=[('test_date', ASCENDING)])
This attempt fails miserably. Moreover, it seems to screw mongo.Observe:
C:\dev\poc\SDR>python FilterOoklaData.py
bad offset:0 accessing file: /data/db/ookla.0 - consider repairing database
Needless to say, that the foreach-found-remove approach works and yields the expected results.
Now, I hope I have provided enough context and (hopefully) have restored my lost honour.
You can use a query to remove all matching documents
var query = {name: 'John'};
db.collection.remove(query);
Be wary, though, if number of matching documents is high, your database might get less responsive. It is often advised to delete documents in smaller chunks.
Let's say, you have 100k documents to delete from a collection. It is better to execute 100 queries that delete 1k documents each than 1 query that deletes all 100k documents.
You can remove it directly using MongoDB scripting language:
db.mycoll.remove({_id:'your_id_here'});
Would deleteMany() be more efficient? I've recently found that remove() is quite slow for 6m documents in a 100m doc collection. Documentation at (https://docs.mongodb.com/manual/reference/method/db.collection.deleteMany)
db.collection.deleteMany(
<filter>,
{
writeConcern: <document>,
collation: <document>
}
)
I would recommend paging if large number of records.
First: Get the count of data you want to delete:
-------------------------- COUNT --------------------------
var query= {"FEILD":"XYZ", 'DATE': {$lt:new ISODate("2019-11-10")}};
db.COL.aggregate([
{$match:query},
{$count: "all"}
])
Second: Start deleting chunk by chunk:
-------------------------- DELETE --------------------------
var query= {"FEILD":"XYZ", 'date': {$lt:new ISODate("2019-11-10")}};
var cursor = db.COL.aggregate([
{$match:query},
{ $limit : 5 }
])
cursor.forEach(function (doc){
db.COL.remove({"_id": doc._id});
});
and this should be faster:
var query={"FEILD":"XYZ", 'date': {$lt:new ISODate("2019-11-10")}};
var ids = db.COL.find(query, {_id: 1}).limit(5);
db.tags.deleteMany({"_id": { "$in": ids.map(r => r._id)}});
Run this query in cmd
db.users.remove( {"_id": ObjectId("5a5f1c472ce1070e11fde4af")});
If you are using node.js write this code
User.remove({ _id: req.body.id },, function(err){...});
When I need to get the number of row(data) inside a SQLite database, I run the following pseudo code.
cmd = "SELECT Count(*) FROM benchmark"
res = runcommand(cmd)
read res to get result.
But, I'm not sure if it's the best way to go. What would be the optimum way to get the number of data in a SQLite DB? I use python for accessing SQLite.
Your query is correct but I would add an alias to make it easier to refer to the result:
SELECT COUNT(*) AS cnt FROM benchmark
Regarding this line:
count size of res
You don't want to count the number of rows in the result set - there will always be only one row. Just read the result out from the column cnt of the first row.
Hi again people of stackoverflow.
I have a routine that has a step that I find unnecessary
lets say you want to get all the images from a gallery, and limit a certain number of images per page.
$db = PDO object
$start = (pagenum x images per page)
$limit = (images per page)
$itemsdata = $db->query("SELECT id,name FROM gallery LIMIT $start,$limit")->fetchAll();
$numitems = $db->query("SELECT id FROM gallery")->rowCount();
$imgsdata is a array of all the images in a gallery for example.
$numimgs is the number of images that the gallery has.
you would need $imgsdata to do a foreach loop on each image in the array, while
$numimgs is needed to generate the page numbering (e.g. << 1 2 3 4 >>)
my grudge is with $db->query("SELECT id FROM gallery")->rowCount();
It feels completely like some sort of cheat, isn't there a direct way to get the number of rows in a table, something like SELECT gallery.Rows?
p.s. currently I'm using SQLite, but I'd need it for MySQL and PostgreSQL as well.
This will tell you the number of rows:
SELECT COUNT(*) FROM gallery
A simple count() aggregate function will return the number of rows quickly
Select count(*) from table
select count(*) from gallery
Me too!
SELECT COUNT(*) FROM gallery
Yes, this should work the same just fine in MySQL, SQLite, and PostgreSQL.
basically i have albums, which has 50 images init.. now if i show list of images, i know from which to which row is showing (showing: 20 to 30 of 50), means showing 10 rows from 20 - 30. well now the problem is, i want to select an image, but still show which postion was it selected, so i can move back and forth, but keep the postion too.
like if i select 5th image, which id is 'sd564', i want to show (6 of 50 images), means you are seeing 6th of 50 images.. if i get next row id and show that, then, i want to show (7 of 50 images).
well i can do all this from pagination pointer easily, like in url say (after=5, after=6)... its moving with postion, but what if i dont have this (after=6) and just have an id, how can i still do that?
i dont want to use (after=6) also because its dynamic site and images adds and delete, so position chnages and sharing with someone else and going back on same old link, then it would be wrong position.
what kind of sql query should i be running for this?
currently i have
select * from images where id = 'sd564';
obviously i need to add limit or some other thing in query to get what i want or maybe run another query to get the result, while keeping this old query inplace too. anyway i just want positioning. i hope you can help me solve this
Example: http://media.photobucket.com/image/color%20splash/aly3265/converse.jpg
sample http://img41.imageshack.us/img41/5631/viewing3of8240.png
Album Query Request (check post below)
select images.* from images, album
where album_id = '5'
and album_id = image_album_id
order by created_date DESC
limit ....;
Assuming created_date is unique per album_id and (album_id,created_date) is unique for all rows in images, then this:
select i1.*, count(*) as position
from images i1
inner join images i2
on i1.album_id = i2.album_id -- get all other pics in this album
and i1.created_date >= i2.created_date -- in case they were created before this pic
where i1.album_id = 5
group by i1.created_date
will reliably get you the images and their position. Please understand that this will only work reliably in case (album_id,created_date) are unique throughout the images table. If that is not the case, the position wont be reliable, and you might not see all photos due to the GROUP BY. Also note that a GROUP BY clause like this, only listing some of the columns that appear in the SELECT list (in this case images.*) is not valid in most RDBMS-es. For a detailed discussion on that matter, see: http://dev.mysql.com/tech-resources/articles/debunking-group-by-myths.html
By doing this:
select i1.*, count(*) as position
from images i1
inner join images i2
on i1.album_id = i2.album_id -- get all other pics in this album
and i1.created_date >= i2.created_date -- in case they were created before this pic
where i1.album_id = 5
group by i1.created_date
having count(*) = 4
you select the image at the 4th position (note the having count(*) = 4)
By doing this:
select i1.*, count(*) as position
from images i1
inner join images i2
on i1.album_id = i2.album_id -- get all other pics in this album
and i1.created_date >= i2.created_date -- in case they were created before this pic
where i1.album_id = 5
group by i1.created_date
having count(*) between 1 and 10
you select all photos with positions 1 through 10 (note the having clause again.)
Of course, if you just want one particular image, you can simply do:
select i1.*, count(*) as position
from images i1
inner join images i2
on i1.album_id = i2.album_id -- get all other pics in this album
and i1.created_date >= i2.created_date -- in case they were created before this pic
where i1.image_id = 's1234'
group by i1.created_date
This will correctly report the position of the image within the album (of course, assuming that image_id is unique with in the images table). You don't need the having clause in that case since you already pinpointed the image you want.
From what you are saying here:
dont want to use (after=6) also because its dynamic site and images adds and delete, so position chnages and sharing with someone else and going back on same old link, then it would be wrong position.
I get the impression that this is not a SQL problem at all. The problem is that the positions of the fotos are local to the search resultset. To reliably naviate by position, you would need to make a snapshot (no pun intended) of some kind. That is, you need to have some way to "freeze" the dataset while it is being browsed.
A simple way to do it, would be to execute the search, and cache the result outside of the actual current datastore. For example, you could use "scratch tables" in your database, simply store it in temporary files, or in some memory caching layer if you have the mem for it. With this model, you'd let the user browse the resultset from the cache, and you would need to clean out the cache when the user's session ends (or after some timeout, you don't want to kill your server because some users don't log out)
Another way to do it, is to simply allow yourself to lie now and then. Let's say you have result pages of 10 images, and a typical search delivers 50 pages of results. Well, you could simply send a resultset for a fixed number of items, say 100 photos (so 10 pages) to the client. These search results would then be your snapshot, and contain references to the actual pictures. If you are storing the URLS in the database , and not the binary data, this reference is simply the URL. Or you could store the database Id there. Anyway, the user is allowed to browse the initial resultset, and chances are that they never browse the entire set. If they do, you re-execute the query on the server side for the next chunk of pages. If many photos were added in the mean time that would end up at positions 1..100, then the user will see stale data: that's the price they pay for having so much time on their hands that they can allow themselvs to browse 10 pages of 10 photos.
(of course, you should tweak the parameters to your liking but you get the idea I'm sure.)
If you don't want to 'lie' and it is really important that people can reliably browse all the results they searched, you could extend your database schema to support snapshots at that level. Now asssuming that there are only two operations for photos, namely "add" and "delete", you would have a TIMESTAMP_ADDED and a TIMESTAMP_REMOVED in your photo table. On add, you do the INSERT in your db, and fill TIMESTAMP_ADDED with the currrent timestamp. The TIMESTAMP_REMOVED would be filled with the theoretical maximum value for whatever data type you like to use to store the timestamp (For this particular case I would probably go for an INT column and simply store the UNIX_TIMESTAMP) On delete, you don't DELETE the row from the db, rather, you mark it as deleted by updating TIMESTAMP_REMOVED column, setting it to the current timestamp. Now when you have to do a search, you use a query like:
SELECT *
FROM photo
WHERE timestamp_added < timestamp_of_initial_search
AND timestamp_removed > timestamp_of_initial_search
AND ...various search criteria...
ORDER BY ...something
LIMIT ...page offset and num items in page...
The timestamp_of_initial_search is the timestamp of executing the initial search for a particular set of criteria. You should store that in the application session while the user is browsing a particular search resultet so you can use that in the subsequent queries required for fetching the pages. The first two WHERE criteria are there to implement the snapshot. The condition timestamp_added < timestamp_of_initial_search ensures we can only see photos that were added before the timestamp of executing the search. The condition timestamp_removed > timestamp_of_initial_search ensures we only search that were not already removed by the time the initial search was executed.
Of course, you still have to do something with the photos that were marked for delete. You could schedule periodical physical deletion for all photos that have a timestamp removed that is smaller than any of the current search resultsets.
If I understood your problem correctly, you can use the Row_Number() function (in SQL Server). To get the desired result, you can use a query something similar to this:
select images1.* from
(SELECT ROW_NUMBER() OVER (ORDER BY image_album_id) as rowID,(SELECT COUNT(*) FROM images) AS totCount, * FROM images) images1
JOIN album ON (album_id = images1.image_album_id)
where album_id = '5'
order by images1.image_album_id
limit ....;
Here the images.rowid gives you the position of the row and images.totCount give you the total number of rows.
Hope it helps.
Thnks.