Save huge array to database - sql

First the introduction, in case there's is a better approach: I have a product table with *product_id* and stock, where stock can be as big as 5000 or 10000, I need to create a list (in another table) where I have a row for each item, this is, if a *propduct_id* has stock 1000 I'll have 1000 rows with this *product_id*, and plus, this list needs to be random.
I chose a PHP (symfony2) solution, as I found how to get a random single product_id based on stock or even how to random order the product list, but I didn't find how to "multiply" this rows by stock.
Now, the main problem:
So, in PHP it's no so difficult, get product_id list, "multiply" by stock and shuffle, the problem comes when I want to save:
If I use $em->flush every 100 records or more I get a memory overflow after a while
If I use $em->flush in every record it takes ages to save
This is my code to save which maybe you can improve:
foreach ($huge_random_list as $indice => $id_product)
{
$preasignacion = new ListaPreasignacion();
$preasignacion->setProductId($id_product);
$preasignacion->setOrden($indice+1);
$em->persist($preasignacion);
if ($indice % 100 == 0) $em->flush();
}
$em->flush();
Edit with final solution based on #Pazi suggestion:
$conn = $em->getConnection();
foreach ($huge_random_list as $indice => $id_product)
{
$conn->executeUpdate("insert into product_list(product_id, order) "
." values({$id_product}, {$indice})");
}

I would suggest to abstain from doctrine ORM and use the DBAL connection an pure sql queries for this purpose. I do this always in my applications, where I have to store much data in short time. Doctrine adds too much overhead with objects, checks and dehydrating. You can retrieve the DBAL connection via the DI container. For example in a contoller:
conn = $this->get('database_connection');
Read more about DBAL

Related

How to Convert foor loop to NHibernate Futures for performance

NHibernate Version: 3.4.0.4000
I'm currently working on optimizing our code so that we can reduce the number of round trips to the database and am looking at a for loop that is one of the culprits. I'm having a hard time figuring out how to batch all of these iterations into a future that gets executed once when sent to SQL Server. Essentially each iteration of the loop causes 2 queries to hit the database!
foreach (var choice in lineItem.LineItemChoices)
{
choice.OptionVersion = _session.Query<OptionVersion>().Where(x => x.Option.Id == choice.OptionId).OrderByDescending(x => x.OptionVersionNumber).FirstOrDefault();
choice.ChoiceVersion = _session.Query<ChoiceVersion>().OrderByDescending(x => x.ChoiceVersionIdentity.ChoiceVersionNumber).Where(x => x.Choice.Id == choice.ChoiceId).FirstOrDefault();
}
One option is to extract OptionId and ChoiceId from all the LineItemChoices into two lists in local memory. Then issue just two queries, one for options and one for choices, giving these lists in .Where(x => optionIds.Contains(x.Option.Id)). This corresponds to SQL IN operator. This requires some postprocessing. You will get two result lists (transform to dictionary or lookup if you expect many results), that you need to process to populate the choice objects. This postprocessing is local and tends to be very cheap compared to database roundtrips. This option can be a bit tricky if the existing FirstOrDefault part is absolutely necessary. Do you expect there to be more than result for a single optionId? If not, this code could instead have used SingleOrDefault, which could just be dropped if converting to use IN-queries.
The other option is to use futures (https://nhibernate.info/doc/nhibernate-reference/performance.html#performance-future). For Linq it means to use ToFuture or ToFutureValue at the end, which also conflicts with FirstOrDefault I believe. The important thing is that you need to loop over all line item choices to initialize ALL queries BEFORE you access the value of any of them. So this is likely to also result in some postprocessing, where you would first store the future values in some list, and then in a second loop access the real value from each query to populate the line item choice.
If you to expect that the queries can yield more than one result (before applying FirstOrDefault), I think you can just use Take(1) instead, as that will still return an IQueryable where you can apply the future method.
The first option is probably the most efficient, since it will just be two queries and allow the database engine to make just one pass over the tables.
Keep the limit on the maximum number of parameters that can be given in an SQL query in mind. If there can be thousands of line item choices, you may need to split them in batches and query for at most 2000 identifiers per round trip.
Adding on the Oskar answer, NHibernate Futures was implement in NHibernate 2.1. It is available on method Future for collections and FutureValue for single values.
In your case, you could separate the IDs of the list in memory ...
var optionIds = lineItem.LineItemChoices.Select(x => x.OptionId);
var choiceIds = lineItem.LineItemChoices.Select(x => x.ChoiceId);
... and execute two queries using Future<T> to get two lits in one hit over the database.
var optionVersions = _session.Query<OptionVersion>()
.Where(x => optionIds.Contains(x.Option.Id))
.OrderByDescending(x => x.OptionVersionNumber)
.Future<OptionVersion>();
var choiceVersions = _session.Query<ChoiceVersion>()
.Where(x => choiceIds.Contains(x.Choice.Id))
.OrderByDescending(x => x.ChoiceVersionIdentity.ChoiceVersionNumber)
.Future<ChoiceVersion>();
After with all you need in memory, you could loop on the original collection you have and search in memory the data to fill up the choice object.
foreach (var choice in lineItem.LineItemChoices)
{
choice.OptionVersion = optionVersions.OrderByDescending(x => x.OptionVersionNumber).FirstOrDefault(x => x.Option.Id == choice.OptionId);
choice.ChoiceVersion = choiceVersions.OrderByDescending(x => x.ChoiceVersionIdentity.ChoiceVersionNumber).FirstOrDefault(x => x.Choice.Id == choice.ChoiceId);
}

Using deepstream List for tens of thousands unique values

I wonder if it's a good/bad idea to use deepstream record.getList for storing a lot of unique values, for example, emails or any other unique identifiers. The main purpose is to be able to answer a question quickly whether we already have, say, a user with such email (email in use) or another record by specific unique field.
I made few experiments today and got two problems:
1) when I tried to populate the list with few thousands values I got
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
and my deepstream server went off. I was able to fix it by adding more memory to the server node process with this flag
--max-old-space-size=5120
it doesn't look fine but allowed me to make a list with more than 5000 items.
2) It wasn't enough for my tests so I precreated the list with 50000 items and put the data directly to rethinkdb table and got another issue on getting the list or modifing it:
RangeError: Maximum call stack size exceeded
I was able to fix it with another flag:
--stack-size=20000
It helps but I believe it's only matter of time when one of those errors appear in production when the list size reaches proper value. I don't know really whether it's nodejs, javascript, deepstream or rethinkdb issue. That's all in general made me think that I try to use deepstream List wrong way. Please, let me know. Thank you in advance!
Whilst you can use lists to store arrays of strings, they are actually intended as collections of recordnames - the actual data would be stored in the record itself, the list would only manage the order of the records.
Having said that, there are two open Github issues to improve performance for very long lists by sending more efficient deltas and by introducing a pagination option
Interesting results in regards to memory though, definitely something that needs to be handled more gracefully. In the meantime you could drastically improve performance by combining updates into one:
var myList = ds.record.getList( 'super-long-list' );
// Sends 10.000 messages
for( var i = 0; i < 10000; i++ ) {
myList.addEntry( 'something-' + i );
}
// Sends 1 message
var entries = [];
for( var i = 0; i < 10000; i++ ) {
entries.push( 'something-' + i );
}
myList.setEntries( entries );

Rails show different object every day

I want to match my user to a different user in his/her community every day. Currently, I use code like this:
#matched_user = User.near(#user).order("RANDOM()").first
But I want to have a different #matched_user on a daily basis. I haven't been able to find anything in Stack or in the APIs that has given me insight on how to do it. I feel it should be simpler than having to resort to a rake task with cron. (I'm on postgres.)
Whenever I find myself hankering for shared 'memory' or transient state, I think to myself "this is what (distributed) caches were invented for".
#matched_user = Rails.cache.fetch(#user.cache_key + '/daily_match', expires_in: 1.day) {
User.near(#user).order("RANDOM()").first
}
NOTE: While specifying a TTL for cache entry tells Rails/the cache system to try and keep that value for the given timeframe, there's NO guarantee that it will. In particular, a cache that aggressively tries to reclaim memory may expire an entry well before its desired expires_in time.
For this particular use case, it shouldn't be a big deal but in cases where the business/domain logic demands periodically generated values that are durable then you really have to factor that into your database.
How about using PostgreSQL's SETSEED function? I used the date to seed so that every day the seed will change, but within a day, the seed will be consistent.:
User.connection.execute "SELECT SETSEED(#{Date.today.strftime("%y%d%m").to_i/1000000.0})"
#matched_user = User.near(#user).order("RANDOM()").first
You may want to seed a random value after using this so that any future calls to random aren't biased:
random = User.connection.execute("SELECT RANDOM()").to_a.first["random"]
# Same code as above:
User.connection.execute "SELECT SETSEED(#{Date.today.strftime("%y%d%m").to_i/1000000.0})"
#matched_user = User.near(#user).order("RANDOM()").first
# Use random value before seed to make new seed:
User.connection.execute "SELECT SETSEED(#{random})"
I have split these steps in different sections just for readability. you can optimise query later.
1) Find all user records till today morning. so that the count will freeze.
usrs_till_today_morning = User.where("created_at <?", DateTime.now.in_time_zone(Time.zone).beginning_of_day)
2) Pluck all ID's
user_ids = usr_till_today_morning.pluck(:id)
3) Today date it will be a range (1..30) but will remain constant throughout the day.
day_today = Time.now.day
4) Select the same ID for the day
todays_user_id = user_ids[day_today % user_ids.count]
#matched_user = User.find(todays_user_id)
So it will give you random user records by maintaining same record throughout the day!!

Magento Bulk update attributes

I am missing the SQL out of this to Bulk update attributes by SKU/UPC.
Running EE1.10 FYI
I have all the rest of the code working but I"m not sure the who/what/why of
actually updating our attributes, and haven't been able to find them, my logic
is
Open a CSV and grab all skus and associated attrib into a 2d array
Parse the SKU into an entity_id
Take the entity_id and the attribute and run updates until finished
Take the rest of the day of since its Friday
Here's my (almost finished) code, I would GREATLY appreciate some help.
/**
* FUNCTION: updateAttrib
*
* REQS: $db_magento
* Session resource
*
* REQS: entity_id
* Product entity value
*
* REQS: $attrib
* Attribute to alter
*
*/
See my response for working production code. Hope this helps someone in the Magento community.
While this may technically work, the code you have written is just about the last way you should do this.
In Magento, you really should be using the models provided by the code and not write database queries on your own.
In your case, if you need to update attributes for 1 or many products, there is a way for you to do that very quickly (and pretty safely).
If you look in: /app/code/core/Mage/Adminhtml/controllers/Catalog/Product/Action/AttributeController.php you will find that this controller is dedicated to updating multiple products quickly.
If you look in the saveAction() function you will find the following line of code:
Mage::getSingleton('catalog/product_action')
->updateAttributes($this->_getHelper()->getProductIds(), $attributesData, $storeId);
This code is responsible for updating all the product IDs you want, only the changed attributes for any single store at a time.
The first parameter is basically an array of Product IDs. If you only want to update a single product, just put it in an array.
The second parameter is an array that contains the attributes you want to update for the given products. For example if you wanted to update price to $10 and weight to 5, you would pass the following array:
array('price' => 10.00, 'weight' => 5)
Then finally, the third and final attribute is the store ID you want these updates to happen to. Most likely this number will either be 1 or 0.
I would play around with this function call and use this instead of writing and maintaining your own database queries.
General Update Query will be like:
UPDATE
catalog_product_entity_[backend_type] cpex
SET
cpex.value = ?
WHERE cpex.attribute_id = ?
AND cpex.entity_id = ?
In order to find the [backend_type] associated with the attribute:
SELECT
  backend_type
FROM
  eav_attribute
WHERE entity_type_id =
  (SELECT
    entity_type_id
  FROM
    eav_entity_type
  WHERE entity_type_code = 'catalog_product')
AND attribute_id = ?
You can get more info from the following blog article:
http://www.blog.magepsycho.com/magento-eav-structure-role-of-eav_attributes-backend_type-field/
Hope this helps you.

Adding more information to Magento packingslip or Invoice PDF

How can i add additional information to the Magento Packingslip PDF. I am using integrated label paper, so i would like to add repeat the customers delivery address at the footer and also, some details like total quantity of items in the order and the cost of the items in the order. I am currently modifying local files in: Mage/Sales/Model/Order/Pdf/ but I have only managed to change to font.
EDIT:
Okay, i have made good progress and added most of the information i need, however, i have now stumbled across a problem.. I would like to get the total weight of all items in each order and Quantity.
I am using this code in the shipment.php:
Under the foreach (This is useful in case an order has a split delivery - as you can have one order with multiple "shipments" So this is why I have the code after here:
foreach ($shipment->getAllItems() as $item){
if ($item->getOrderItem()->getParentItem()) {
continue;
...
then I have this further down:
$shippingweight=0;
$shippingweight= $item->getWeight()*$item->getQty();
$page->drawText($shippingweight . Mage::helper('sales'), $x, $y, 'UTF-8');
This is great for Row totals. But not for the whole shipment. What I need to do is have this bit of code "added up" for each item in the shipment to create the total Weight of the whole shipment. It is very important that I only have the total of the shipment - not the order as I will be splitting shipments.
Almost at the end of getPdf(...) in Mage_Sales_Model_Order_Pdf_Shipment you will find the code that inserts new pages (lines 93-94 in 1.5.0.1)
if($this->y<15)
$page = $this->newPage(....)
happening in a for-loop.
You can change the logic here to make it change page earlier to make room for your extra information and then add it before the page shift. You should also place code after the for-block if you want it to appear on the last page as well.
Note: You should not change files in app/code/code/Mage directly. Instead, you should place your changed files under app/code/local/Mage using the same folder structure. That way your changes won't accidently get overwritten in an upgrade.
This should get you the total number of items in your order (there might be a faster way but don't know it off the top of my head):
$quote = Mage::getModel('sales/quote')->load($order->getQuoteId());
$itemsCount = $quote->getItemsSummaryQty();
For invoice PDF
at app/code/local/Mage/Sales/Model/Order/Pdf/Items/Invoice/Default.php
add this before $lineBlock
$product = Mage::getModel('catalog/product')->loadByAttribute('sku', $this->getSku($item), array('weight'));
$lines[][] = array(
'text' => 'Weight: '. $product->getData('weight')*1 .'Kg/ea. Total: ' .$product->getData('weight')*$item->getQty() . 'Kg',
'feed' => 400
);