'boost::archive::archive_exception' what(): unsupported version - serialization

I cloned this repository:
https://github.com/srianant/computer_vision
built and ran,got this error:
OpenPose/DLIB Gesture, Action and Face Recognition.
resolution: 640x480
net_resolution: 656x368
handNetInputSize: 368x368
face_net_resolution: 368x368
cCamera Resolution set to: 640x480
Push following keys:
p for pause sample generation
f for generating face samples
t for train samples
c for continue camera feed
h for display key commands
q for quit program
terminate called after throwing an instance of 'boost::archive::archive_exception'
what(): unsupported version
And this exception thrown from this file:line:32
https://github.com/srianant/computer_vision/blob/master/openpose/src/openpose/user_code/pose_model.cpp
I think it is trying to deserialized this file:
https://github.com/srianant/computer_vision/blob/master/openpose/train_data/pose/pose.model

You're probably doing something wrong: have you checked your boost version? It could just be (very) old.
I can deserialize all the input archives that use boost.
The only use of boost is in openpose_recognition.cpp, and the only archives being read are text archives from the training data.
It is easy to create a standalone program to verify that these file can be deserialized.
Creating The Minimal Declarations
Extracted from pose_model.hpp:
// Timesteps per samples
const int timesteps = 5; // 5 frames per sample
// 2-dim data to store different postures
const int max_pose_count = 10;
const int max_pose_score_pair = 36; // 34 pairs for pose + 2 padding for multiples of 4
typedef std::array<std::string, max_pose_count> pose;
typedef std::array<double,max_pose_score_pair*timesteps> pose_sample_type;
// 2-dim data to store different hand actions
const int max_hand_count = 10;
const int max_hand_score_pair = 40; // 40 pairs for hand
typedef std::array<std::string,max_hand_count> hand;
typedef std::array<double,max_hand_score_pair*timesteps> hand_sample_type;
// 2-dim data to store different faces
const int max_face_count = 10;
const int max_face_score_pair = 96; // 96 pairs (94 + 2 pairs for padding for multiples of 4)
typedef std::array<std::string,max_face_count> face;
typedef std::array<double,max_face_score_pair*timesteps> face_sample_type;
Test Method:
All extractions follow the same pattern, let's add a debug_print method later:
template <typename T>
void read_and_verify(std::string name, T& into) {
std::ifstream ifs("/tmp/so/computer_vision/openpose/train_data/" + name);
boost::archive::text_iarchive ia(ifs);
ia >> into;
debug_print(name, into);
}
Note: Adjust the path to match your file locations
Reading All The Archives:
int main() {
{
pose pose_names;
std::vector<pose_sample_type> pose_samples; // vector of m_pose_sample vectors
std::vector<double> pose_labels; // vector of pose labels (eg. 1,2,3...)
read_and_verify("pose/pose_names.txt", pose_names);
read_and_verify("pose/pose_samples.txt", pose_samples);
read_and_verify("pose/pose_labels.txt", pose_labels);
}
{
face face_names; // vector of face emotions names (eg. normal, sad, happy...)
std::vector<face_sample_type> face_samples; // vector of m_face_sample vectors
std::vector<double> face_labels; // vector of face emotions labels (eg. 1,2,3...)
read_and_verify("face/face_names.txt", face_names);
read_and_verify("face/face_samples.txt", face_samples);
read_and_verify("face/face_labels.txt", face_labels);
}
{
hand hand_names; // vector of hand gesture names (eg. fist, victory, stop...);
std::vector<hand_sample_type> left_hand_samples; // vector of m_left_hand_sample vectors
std::vector<hand_sample_type> right_hand_samples; // vector of m_right_hand_sample vectors
std::vector<double> left_hand_labels; // vector of left hand labels (eg. 1,2,3...)
std::vector<double> right_hand_labels; // vector of right hand labels (eg. 1,2,3...)
read_and_verify("hand/hand_names.txt", hand_names);
read_and_verify("hand/left_hand_samples.txt", left_hand_samples);
read_and_verify("hand/right_hand_samples.txt", right_hand_samples);
read_and_verify("hand/left_hand_labels.txt", left_hand_labels);
read_and_verify("hand/right_hand_labels.txt", right_hand_labels);
}
}
Printing Debug Info:
We need only three different debug_print overloads, for names, samples and labels:
template <size_t N> // names
void debug_print(std::string name, std::array<std::string, N> const& data) {
std::cout << name << ": ";
for (auto& n : data)
if (!n.empty())
std::cout << n << " ";
std::cout << "\n";
}
// labels
void debug_print(std::string name, std::vector<double> const& data) {
std::cout << name << ": ";
for (auto& a : data)
std::cout << a << " ";
std::cout << "\n";
}
template <size_t N> // samples
void debug_print(std::string name, std::vector<std::array<double, N> > const& data) {
std::cout << name << ": ";
for (auto& a : data)
std::cout << "{" << a[0] << "...} ";
std::cout << "\n";
}
DEMO TIME
The complete program is here:
Live On Coliru
#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/text_oarchive.hpp>
#include <boost/serialization/array.hpp>
#include <boost/serialization/vector.hpp>
#include <fstream>
#include <iostream>
///////////////// from pose_model.hpp
//
// Timesteps per samples
const int timesteps = 5; // 5 frames per sample
// 2-dim data to store different postures
const int max_pose_count = 10;
const int max_pose_score_pair = 36; // 34 pairs for pose + 2 padding for multiples of 4
typedef std::array<std::string, max_pose_count> pose;
typedef std::array<double,max_pose_score_pair*timesteps> pose_sample_type;
// 2-dim data to store different hand actions
const int max_hand_count = 10;
const int max_hand_score_pair = 40; // 40 pairs for hand
typedef std::array<std::string,max_hand_count> hand;
typedef std::array<double,max_hand_score_pair*timesteps> hand_sample_type;
// 2-dim data to store different faces
const int max_face_count = 10;
const int max_face_score_pair = 96; // 96 pairs (94 + 2 pairs for padding for multiples of 4)
typedef std::array<std::string,max_face_count> face;
typedef std::array<double,max_face_score_pair*timesteps> face_sample_type;
//
///////////////// end pose_model.hpp
template <size_t N> // names
void debug_print(std::string name, std::array<std::string, N> const& data) {
std::cout << name << ": ";
for (auto& n : data)
if (!n.empty())
std::cout << n << " ";
std::cout << "\n";
}
// labels
void debug_print(std::string name, std::vector<double> const& data) {
std::cout << name << ": ";
for (auto& a : data)
std::cout << a << " ";
std::cout << "\n";
}
template <size_t N> // samples
void debug_print(std::string name, std::vector<std::array<double, N> > const& data) {
std::cout << name << ": ";
for (auto& a : data)
std::cout << "{" << a[0] << "...} ";
std::cout << "\n";
}
template <typename T>
void read_and_verify(std::string name, T& into) {
std::ifstream ifs("/tmp/so/computer_vision/openpose/train_data/" + name);
boost::archive::text_iarchive ia(ifs);
ia >> into;
debug_print(name, into);
}
int main() {
{
pose pose_names;
std::vector<pose_sample_type> pose_samples; // vector of m_pose_sample vectors
std::vector<double> pose_labels; // vector of pose labels (eg. 1,2,3...)
read_and_verify("pose/pose_names.txt", pose_names);
read_and_verify("pose/pose_samples.txt", pose_samples);
read_and_verify("pose/pose_labels.txt", pose_labels);
}
{
face face_names; // vector of face emotions names (eg. normal, sad, happy...)
std::vector<face_sample_type> face_samples; // vector of m_face_sample vectors
std::vector<double> face_labels; // vector of face emotions labels (eg. 1,2,3...)
read_and_verify("face/face_names.txt", face_names);
read_and_verify("face/face_samples.txt", face_samples);
read_and_verify("face/face_labels.txt", face_labels);
}
{
hand hand_names; // vector of hand gesture names (eg. fist, victory, stop...);
std::vector<hand_sample_type> left_hand_samples; // vector of m_left_hand_sample vectors
std::vector<hand_sample_type> right_hand_samples; // vector of m_right_hand_sample vectors
std::vector<double> left_hand_labels; // vector of left hand labels (eg. 1,2,3...)
std::vector<double> right_hand_labels; // vector of right hand labels (eg. 1,2,3...)
read_and_verify("hand/hand_names.txt", hand_names);
read_and_verify("hand/left_hand_samples.txt", left_hand_samples);
read_and_verify("hand/right_hand_samples.txt", right_hand_samples);
read_and_verify("hand/left_hand_labels.txt", left_hand_labels);
read_and_verify("hand/right_hand_labels.txt", right_hand_labels);
}
}
Printing (lines truncated at 100 characters):
pose/pose_names.txt: unknown close_to_camera standing sitting
pose/pose_samples.txt: {204.658...} {196.314...} {210.322...} {191.529...} {192.8...} {187.155...} {...
pose/pose_labels.txt: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
face/face_names.txt: unknown normal happy sad surprise
face/face_samples.txt: {89.5967...} {93.7026...} {97.6529...} {91.7247...} {91.8048...} {91.3076...}...
face/face_labels.txt: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
hand/hand_names.txt: unknown fist pinch wave victory stop thumbsup
hand/left_hand_samples.txt: {0...} {0.000524104...} {0...} {0...} {0...} {0...} {0...} {0...} {0...}...
hand/right_hand_samples.txt: {0...} {0.00166845...} {0.00161618...} {0.00176376...} {0.00167096...} ...
hand/left_hand_labels.txt: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1...
hand/right_hand_labels.txt: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
BONUS:
I had to use a hack to circumvent the file-size limit on Coliru: I changed to code to read bzip2 compressed files:
#include <boost/iostreams/filter/bzip2.hpp>
#include <boost/iostreams/filtering_stream.hpp>
template <typename T>
void read_and_verify(std::string name, T& into) {
std::ifstream compressed("/tmp/so/computer_vision/openpose/train_data/" + name + ".bz2");
boost::iostreams::filtering_istream fs;
fs.push(boost::iostreams::bzip2_decompressor{});
fs.push(compressed);
boost::archive::text_iarchive ia(fs);
ia >> into;
debug_print(name, into);
}
Now you can actually see the output Live On Coliru

After replacing ::archive 14 with ::archive 12 in all txt files,this problem gone,#sehe thanks very much

Related

Need help in checking type of edge/curve

I have created a sample example for the intersection of two circles.
In this example I am able to get the bound faces and the source and target points.
I have manually plotted the source and target points. Refer snapshot for the same two intersecting circles:
I want to find out whether the edges between the source and target points is a line segment, arc or a circle.
I tried to find this in the 2D arrangement documentation but couldn't find it.
Below is the code snippet :
#include <CGAL/Cartesian.h>
#include <CGAL/Exact_rational.h>
#include <CGAL/Arr_circle_segment_traits_2.h>
#include <CGAL/Arrangement_2.h>
typedef CGAL::Cartesian<CGAL::Exact_rational> Kernel;
typedef Kernel::Circle_2 Circle_2;
typedef CGAL::Arr_circle_segment_traits_2<Kernel> Traits_2;
typedef Traits_2::CoordNT CoordNT;
typedef Traits_2::Point_2 Point_2;
typedef Traits_2::Curve_2 Curve_2;
typedef CGAL::Arrangement_2<Traits_2> Arrangement_2;
int main()
{
// Create a circle centered at (0,0) with radius 8.
Kernel::Point_2 c1 = Kernel::Point_2(0, 0);
CGAL::Exact_rational sqr_r1 = CGAL::Exact_rational(64); // = 8*^2
Circle_2 circ1 = Circle_2(c1, sqr_r1, CGAL::CLOCKWISE);
Curve_2 cv1 = Curve_2(circ1);
// Create a circle centered at (10,0) with radius 8.
Kernel::Point_2 c2 = Kernel::Point_2(10, 0);
CGAL::Exact_rational sqr_r2 = CGAL::Exact_rational(64); // = 8*^2
Circle_2 circ2 = Circle_2(c2, sqr_r2, CGAL::CLOCKWISE);
Curve_2 cv2 = Curve_2(circ2);
Arrangement_2 arr;
insert(arr, cv1);
insert(arr, cv2);
for (auto fit = arr.faces_begin(); fit != arr.faces_end(); ++fit)
{
if (fit->is_unbounded())
std::cout << "Unbounded face.\n";
else {
Arrangement_2::Ccb_halfedge_circulator curr, start;
start = curr = fit->outer_ccb();
do {
std::cout << " source --> " << curr->source()->point() << "\n";
std::cout << " target --> " << curr->target()->point() << "\n";
++curr;
} while (curr != start);
std::cout << std::endl;
}
}
return 0;
}

CGAL hole filling with color

I need to implement a 3D hole filling using CGAL library that support color.
is there any possibility to do it without CGAL library modification? I need to fill the hole with an average color of the hole's edge.
Regards, Ali
....
int main(int argc, char* argv[])
{
const char* filename = (argc > 1) ? argv[1] : "data/mech-holes-shark.off";
Mesh mesh;
OpenMesh::IO::read_mesh(mesh, filename);
// Incrementally fill the holes
unsigned int nb_holes = 0;
BOOST_FOREACH(halfedge_descriptor h, halfedges(mesh))
{
if(CGAL::is_border(h,mesh))
{
std::vector<face_descriptor> patch_facets;
std::vector<vertex_descriptor> patch_vertices;
bool success = CGAL::cpp11::get<0>(
CGAL::Polygon_mesh_processing::triangulate_refine_and_fair_hole(
mesh,
h,
std::back_inserter(patch_facets),
std::back_inserter(patch_vertices),
CGAL::Polygon_mesh_processing::parameters::vertex_point_map(get(CGAL::vertex_point, mesh)).
geom_traits(Kernel())) );
CGAL_assertion(CGAL::is_valid_polygon_mesh(mesh));
std::cout << "* FILL HOLE NUMBER " << ++nb_holes << std::endl;
std::cout << " Number of facets in constructed patch: " << patch_facets.size() << std::endl;
std::cout << " Number of vertices in constructed patch: " << patch_vertices.size() << std::endl;
std::cout << " Is fairing successful: " << success << std::endl;
}
}
CGAL_assertion(CGAL::is_valid_polygon_mesh(mesh));
OpenMesh::IO::write_mesh(mesh, "filled_OM.off");
return 0;
}
If you use CGAL::Surface_mesh as Mesh, you can use dynamic property maps to define attributes for your simplices, which allows for example to define colors per face. The "standard" syntax for this is
mesh.add_property_map<face_descriptor, CGAL::Color >("f:color")
I think. There are examples in the documentation of Surface_mesh.

How do I iterate through all the faces in a CGAL StraightSkeleton_2 / HalfedgeDS?

My goal is to take a polygon, find the straight skeleton, then turn each face into its own polygon.
I'm using the CGAL Create_straight_skeleton_2.cpp example as a starting point. I'm able to have it compute the skeleton and can iterate through the faces:
SsPtr iss = CGAL::create_interior_straight_skeleton_2(poly);
for( auto face = iss->faces_begin(); face != iss->faces_end(); ++face ) {
// How do I iterate through the vertexes?
}
But with a HalfedgeDSFace it looks like I can only call halfedge() for a HalfedgeDSHalfedge.
At that point I'm confused how to iterate through the vertexes in the face. Do I just treat it like a circular linked list and follow the next pointer until I get back to face->halfedge()?
Here's my first attempt at treating it like a circular linked list:
SsPtr iss = CGAL::create_interior_straight_skeleton_2(poly);
std::cout << "Faces:" << iss->size_of_faces() << std::endl;
for( auto face = iss->faces_begin(); face != iss->faces_end(); ++face ) {
std::cout << "Faces:" << iss->size_of_faces() << std::endl;
std::cout << "----" << std::endl;
do {
std::cout << edge->vertex()->point() << std::endl;
edge = edge->next();
} while (edge != face->halfedge());
}
But that seems to put an empty vertex in each face:
Faces:4
----
197.401 420.778
0 0
166.95 178.812
----
511.699 374.635
0 0
197.401 420.778
----
428.06 122.923
0 0
511.699 374.635
----
166.95 178.812
0 0
428.06 122.923
So the iteration is much as I'd expected:
// Each face
for( auto face = iss->faces_begin(); face != iss->faces_end(); ++face ) {
Ss::Halfedge_const_handle begin = face->halfedge();
Ss::Halfedge_const_handle edge = begin;
// Each vertex
do {
std::cout << edge->vertex()->point() << std::endl;
edge = edge->next();
} while (edge != begin);
}
The reason it wasn't working was the contour polygon I was using had a clockwise orientation. Once I reversed the order of the points I started getting valid data out of the faces.
For reference here's how you'd iterate over the vertexes in the contour:
// Pick a face and use the opposite edge to get on the contour.
Ss::Halfedge_const_handle begin = iss->faces_begin()->halfedge()->opposite();
Ss::Halfedge_const_handle edge = begin;
do {
std::cout << edge->vertex()->point() << std::endl;
// Iterate in the opposite direction.
edge = edge->prev();
} while (edge != begin);

Is there any GMP logarithm function?

Is there any logarithm function implemented in the GMP library?
I know you didn't ask how to implement it, but...
You can implement a rough one using the properties of logarithms: http://gnumbers.blogspot.com.au/2011/10/logarithm-of-large-number-it-is-not.html
And the internals of the GMP library: https://gmplib.org/manual/Integer-Internals.html
(Edit: Basically you just use the most significant "digit" of the GMP representation since the base of the representation is huge B^N is much larger than B^{N-1})
Here is my implementation for Rationals.
double LogE(mpq_t m_op)
{
// log(a/b) = log(a) - log(b)
// And if a is represented in base B as:
// a = a_N B^N + a_{N-1} B^{N-1} + ... + a_0
// => log(a) \approx log(a_N B^N)
// = log(a_N) + N log(B)
// where B is the base; ie: ULONG_MAX
static double logB = log(ULONG_MAX);
// Undefined logs (should probably return NAN in second case?)
if (mpz_get_ui(mpq_numref(m_op)) == 0 || mpz_sgn(mpq_numref(m_op)) < 0)
return -INFINITY;
// Log of numerator
double lognum = log(mpq_numref(m_op)->_mp_d[abs(mpq_numref(m_op)->_mp_size) - 1]);
lognum += (abs(mpq_numref(m_op)->_mp_size)-1) * logB;
// Subtract log of denominator, if it exists
if (abs(mpq_denref(m_op)->_mp_size) > 0)
{
lognum -= log(mpq_denref(m_op)->_mp_d[abs(mpq_denref(m_op)->_mp_size)-1]);
lognum -= (abs(mpq_denref(m_op)->_mp_size)-1) * logB;
}
return lognum;
}
(Much later edit)
Coming back to this 5 years later, I just think it's cool that the core concept of log(a) = N log(B) + log(a_N) shows up even in native floating point implementations, here is the glibc one for ia64
And I used it again after encountering this question
No there is no such function in GMP.
Only in MPFR.
The method below makes use of mpz_get_d_2exp and was obtained from the gmp R package. It can be found under the function biginteger_log in the file bigintegerR.cc (You first have to download the source (i.e. the tar file)). You can also see it here: biginteger_log.
// Adapted for general use from the original biginteger_log
// xi = di * 2 ^ ex ==> log(xi) = log(di) + ex * log(2)
double biginteger_log_modified(mpz_t x) {
signed long int ex;
const double di = mpz_get_d_2exp(&ex, x);
return log(di) + log(2) * (double) ex;
}
Of course, the above method could be modified to return the log with any base using the properties of logarithm (e.g. the change of base formula).
Here it is:
https://github.com/linas/anant
Provides gnu mp real and complex logarithm, exp, sine, cosine, gamma, arctan, sqrt, polylogarithm Riemann and Hurwitz zeta, confluent hypergeometric, topologists sine, and more.
As other answers said, there is no logarithmic function in GMP. Part of answers provided implementations of logarithmic function, but with double precision only, not infinite precision.
I implemented full (arbitrary) precision logarithmic function below, even up to thousands bits of precision if you wish. Using mpf, generic floating point type of GMP.
My code uses Taylor serie for ln(1 + x) plus mpf_sqrt() (for boosting computation).
Code is in C++, and is quite large due to two facts. First is that it does precise time measurements to figure out best combinations of internal computational parameters for your machine. Second is that it uses extra speed improvements like extra usage of mpf_sqrt() for preparing initial value.
Algorithm of my code is following:
Factor out exponent of 2 from input x, i.e. rewrite x = d * 2^exp, with usage of mpf_get_d_2exp().
Make d (from step above) such that 2/3 <= d <= 4/3, this is achieved by possibly multiplying d by 2 and doing --exp. This ensures that d always differs from 1 by at most 1/3, in other words d extends from 1 in both directions (negative and positive) in equal distance.
Divide x by 2^exp, with usage of mpf_div_2exp() and mpf_mul_2exp().
Take square root of x several times (num_sqrt times) so that x becomes closer to 1. This ensures that Taylor Serie converges more rapidly. Because computation of square root several times is faster than contributing much more time in extra iterations of Taylor Serie.
Compute Taylor Serie for ln(1 + x) up to desired precision (even thousands of bit of precision if needed).
Because in Step 4. we took square root several times, now we need to multiply y (result of Taylor Serie) by 2^num_sqrt.
Finally because in Step 1. we factored out 2^exp, now we need to add ln(2) * exp to y. Here ln(2) is computed by just one recursive call to same function that implements whole algorithm.
Steps above come from sequence of formulas ln(x) = ln(d * 2^exp) = ln(d) + exp * ln(2) = ln(sqrt(...sqrt(d))) * num_sqrt + exp * ln(2).
My implementation automatically does timings (just once per program run) to figure out how many square roots is needed to balance out Taylor Serie computation. If you need to avoid timings then pass 3rd parameter sqrt_range to mpf_ln() equal to 0.001 instead of zero.
main() function contains examples of usage, testing of correctness (by comparing to lower precision std::log()), timings and output of different verbose information. Function is tested on first 1024 bits of Pi number.
Before call to my function mpf_ln() don't forget to setup needed precision of computation by calling mpf_set_default_prec(bits) with desired precision in bits.
Computational time of my mpf_ln() is about 40-90 micro-seconds for 1024 bit precision. Bigger precision will take more time, that is approximately linearly proportional to the amount of precision bits.
Very first run of a function takes considerably longer time becuse it does pre-computation of timings table and value of ln(2). So it is suggested to do first single computation at program start to avoid longer computation inside time critical region later in code.
To compile for example on Linux, you have to install GMP library and issue command:
clang++-14 -std=c++20 -O3 -lgmp -lgmpxx -o main main.cpp && ./main
Try it online!
#include <cstdint>
#include <iomanip>
#include <iostream>
#include <cmath>
#include <chrono>
#include <mutex>
#include <vector>
#include <unordered_map>
#include <gmpxx.h>
double Time() {
static auto const gtb = std::chrono::high_resolution_clock::now();
return std::chrono::duration_cast<std::chrono::duration<double>>(
std::chrono::high_resolution_clock::now() - gtb).count();
}
mpf_class mpf_ln(mpf_class x, bool verbose = false, double sqrt_range = 0) {
auto total_time = verbose ? Time() : 0.0;
int const prec = mpf_get_prec(x.get_mpf_t());
if (sqrt_range == 0) {
static std::mutex mux;
std::lock_guard<std::mutex> lock(mux);
static std::vector<std::pair<size_t, double>> ranges;
if (ranges.empty())
mpf_ln(3.14, false, 0.01);
while (ranges.empty() || ranges.back().first < prec) {
size_t const bits = ranges.empty() ? 64 : ranges.back().first * 3 / 2;
mpf_class x = 3.14;
mpf_set_prec(x.get_mpf_t(), bits);
double sr = 0.35, sr_best = 1, time_best = 1000;
size_t constexpr ntests = 5;
while (true) {
auto tim = Time();
for (size_t i = 0; i < ntests; ++i)
mpf_ln(x, false, sr);
tim = (Time() - tim) / ntests;
bool updated = false;
if (tim < time_best) {
sr_best = sr;
time_best = tim;
updated = true;
}
sr /= 1.5;
if (sr <= 1e-8) {
ranges.push_back(std::make_pair(bits, sr_best));
break;
}
}
}
sqrt_range = std::lower_bound(ranges.begin(), ranges.end(), size_t(prec),
[](auto const & a, auto const & b){
return a.first < b;
})->second;
}
signed long int exp = 0;
// https://gmplib.org/manual/Converting-Floats
double d = mpf_get_d_2exp(&exp, x.get_mpf_t());
if (d < 2.0 / 3) {
d *= 2;
--exp;
}
mpf_class t;
// https://gmplib.org/manual/Float-Arithmetic
if (exp >= 0)
mpf_div_2exp(x.get_mpf_t(), x.get_mpf_t(), exp);
else
mpf_mul_2exp(x.get_mpf_t(), x.get_mpf_t(), -exp);
auto sqrt_time = verbose ? Time() : 0.0;
// Multiple Sqrt of x
int num_sqrt = 0;
if (x >= 1)
while (x >= 1.0 + sqrt_range) {
// https://gmplib.org/manual/Float-Arithmetic
mpf_sqrt(x.get_mpf_t(), x.get_mpf_t());
++num_sqrt;
}
else
while (x <= 1.0 - sqrt_range) {
mpf_sqrt(x.get_mpf_t(), x.get_mpf_t());
++num_sqrt;
}
if (verbose)
sqrt_time = Time() - sqrt_time;
static mpf_class const eps = [&]{
mpf_class eps = 1;
mpf_div_2exp(eps.get_mpf_t(), eps.get_mpf_t(), prec + 8);
return eps;
}(), meps = -eps;
// Taylor Serie for ln(1 + x)
// https://math.stackexchange.com/a/878376/826258
x -= 1;
mpf_class k = x, y = x, mx = -x;
size_t num_iters = 0;
for (int32_t i = 2;; ++i) {
k *= mx;
y += k / i;
// Check if error is small enough
if (meps <= k && k <= eps) {
num_iters = i;
break;
}
}
auto VerboseInfo = [&]{
if (!verbose)
return;
total_time = Time() - total_time;
std::cout << std::fixed << "Sqrt range " << sqrt_range << ", num sqrts "
<< num_sqrt << ", sqrt time " << sqrt_time << " sec" << std::endl;
std::cout << "Ln number of iterations " << num_iters << ", ln time "
<< total_time << " sec" << std::endl;
};
// Correction due to multiple sqrt of x
y *= 1 << num_sqrt;
if (exp == 0) {
VerboseInfo();
return y;
}
mpf_class ln2;
{
static std::mutex mutex;
std::lock_guard<std::mutex> lock(mutex);
static std::unordered_map<size_t, mpf_class> ln2s;
auto it = ln2s.find(size_t(prec));
if (it == ln2s.end()) {
mpf_class sqrt_sqrt_2 = 2;
mpf_sqrt(sqrt_sqrt_2.get_mpf_t(), sqrt_sqrt_2.get_mpf_t());
mpf_sqrt(sqrt_sqrt_2.get_mpf_t(), sqrt_sqrt_2.get_mpf_t());
it = ln2s.insert(std::make_pair(size_t(prec), mpf_class(mpf_ln(sqrt_sqrt_2, false, sqrt_range) * 4))).first;
}
ln2 = it->second;
}
y += ln2 * exp;
VerboseInfo();
return y;
}
std::string mpf_str(mpf_class const & x) {
mp_exp_t exp;
auto s = x.get_str(exp);
return s.substr(0, exp) + "." + s.substr(exp);
}
int main() {
// https://gmplib.org/manual/Initializing-Floats
mpf_set_default_prec(1024); // bit-precision
// http://www.math.com/tables/constants/pi.htm
mpf_class x(
"3."
"1415926535 8979323846 2643383279 5028841971 6939937510 "
"5820974944 5923078164 0628620899 8628034825 3421170679 "
"8214808651 3282306647 0938446095 5058223172 5359408128 "
"4811174502 8410270193 8521105559 6446229489 5493038196 "
"4428810975 6659334461 2847564823 3786783165 2712019091 "
"4564856692 3460348610 4543266482 1339360726 0249141273 "
"7245870066 0631558817 4881520920 9628292540 9171536436 "
);
std::cout << std::boolalpha << std::fixed << std::setprecision(14);
std::cout << "x:" << std::endl << mpf_str(x) << std::endl;
auto cmath_val = std::log(mpf_get_d(x.get_mpf_t()));
std::cout << "cmath ln(x): " << std::endl << cmath_val << std::endl;
auto volatile tmp = mpf_ln(x); // Pre-Compute to heat-up timings table.
auto time_start = Time();
size_t constexpr ntests = 20;
for (size_t i = 0; i < ntests; ++i) {
auto volatile tmp = mpf_ln(x);
}
std::cout << "mpf ln(x) time " << (Time() - time_start) / ntests << " sec" << std::endl;
auto mpf_val = mpf_ln(x, true);
std::cout << "mpf ln(x):" << std::endl << mpf_str(mpf_val) << std::endl;
std::cout << "equal to cmath: " << (std::abs(mpf_get_d(mpf_val.get_mpf_t()) - cmath_val) <= 1e-14) << std::endl;
return 0;
}
Output:
x:
3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067982148086513282306647093844609550582231725359408128481117450284102701938521105559644622948954930381964428810975665933446128475648233786783165271201909145648566923460348610454326648213393607260249141273724587007
cmath ln(x):
1.14472988584940
mpf ln(x) time 0.00004426845000 sec
Sqrt range 0.00000004747981, num sqrts 23, sqrt time 0.00001440000000 sec
Ln number of iterations 42, ln time 0.00003873100000 sec
mpf ln(x):
1.144729885849400174143427351353058711647294812915311571513623071472137769884826079783623270275489707702009812228697989159048205527923456587279081078810286825276393914266345902902484773358869937789203119630824756794011916028217227379888126563178049823697313310695003600064405487263880223270096433504959511813198
equal to cmath: true

g++ SSE intrinsics dilemma - value from intrinsic "saturates"

I wrote a simple program to implement SSE intrinsics for computing the inner product of two large (100000 or more elements) vectors. The program compares the execution time for both, inner product computed the conventional way and using intrinsics. Everything works out fine, until I insert (just for the fun of it) an inner loop before the statement that computes the inner product. Before I go further, here is the code:
//this is a sample Intrinsics program to compute inner product of two vectors and compare Intrinsics with traditional method of doing things.
#include <iostream>
#include <iomanip>
#include <xmmintrin.h>
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
using namespace std;
typedef float v4sf __attribute__ ((vector_size(16)));
double innerProduct(float* arr1, int len1, float* arr2, int len2) { //assume len1 = len2.
float result = 0.0;
for(int i = 0; i < len1; i++) {
for(int j = 0; j < len1; j++) {
result += (arr1[i] * arr2[i]);
}
}
//float y = 1.23e+09;
//cout << "y = " << y << endl;
return result;
}
double sse_v4sf_innerProduct(float* arr1, int len1, float* arr2, int len2) { //assume that len1 = len2.
if(len1 != len2) {
cout << "Lengths not equal." << endl;
exit(1);
}
/*steps:
* 1. load a long-type (4 float) into a v4sf type data from both arrays.
* 2. multiply the two.
* 3. multiply the same and store result.
* 4. add this to previous results.
*/
v4sf arr1Data, arr2Data, prevSums, multVal, xyz;
//__builtin_ia32_xorps(prevSums, prevSums); //making it equal zero.
//can explicitly load 0 into prevSums using loadps or storeps (Check).
float temp[4] = {0.0, 0.0, 0.0, 0.0};
prevSums = __builtin_ia32_loadups(temp);
float result = 0.0;
for(int i = 0; i < (len1 - 3); i += 4) {
for(int j = 0; j < len1; j++) {
arr1Data = __builtin_ia32_loadups(&arr1[i]);
arr2Data = __builtin_ia32_loadups(&arr2[i]); //store the contents of two arrays.
multVal = __builtin_ia32_mulps(arr1Data, arr2Data); //multiply.
xyz = __builtin_ia32_addps(multVal, prevSums);
prevSums = xyz;
}
}
//prevSums will hold the sums of 4 32-bit floating point values taken at a time. Individual entries in prevSums also need to be added.
__builtin_ia32_storeups(temp, prevSums); //store prevSums into temp.
cout << "Values of temp:" << endl;
for(int i = 0; i < 4; i++)
cout << temp[i] << endl;
result += temp[0] + temp[1] + temp[2] + temp[3];
return result;
}
int main() {
clock_t begin, end;
int length = 100000;
float *arr1, *arr2;
double result_Conventional, result_Intrinsic;
// printStats("Allocating memory.");
arr1 = new float[length];
arr2 = new float[length];
// printStats("End allocation.");
srand(time(NULL)); //init random seed.
// printStats("Initializing array1 and array2");
begin = clock();
for(int i = 0; i < length; i++) {
// for(int j = 0; j < length; j++) {
// arr1[i] = rand() % 10 + 1;
arr1[i] = 2.5;
// arr2[i] = rand() % 10 - 1;
arr2[i] = 2.5;
// }
}
end = clock();
cout << "Time to initialize array1 and array2 = " << ((double) (end - begin)) / CLOCKS_PER_SEC << endl;
// printStats("Finished initialization.");
// printStats("Begin inner product conventionally.");
begin = clock();
result_Conventional = innerProduct(arr1, length, arr2, length);
end = clock();
cout << "Time to compute inner product conventionally = " << ((double) (end - begin)) / CLOCKS_PER_SEC << endl;
// printStats("End inner product conventionally.");
// printStats("Begin inner product using Intrinsics.");
begin = clock();
result_Intrinsic = sse_v4sf_innerProduct(arr1, length, arr2, length);
end = clock();
cout << "Time to compute inner product with intrinsics = " << ((double) (end - begin)) / CLOCKS_PER_SEC << endl;
//printStats("End inner product using Intrinsics.");
cout << "Results: " << endl;
cout << " result_Conventional = " << result_Conventional << endl;
cout << " result_Intrinsics = " << result_Intrinsic << endl;
return 0;
}
I use the following g++ invocation to build this:
g++ -W -Wall -O2 -pedantic -march=i386 -msse intrinsics_SSE_innerProduct.C -o innerProduct
Each of the loops above, in both the functions, runs a total of N^2 times. However, given that arr1 and arr2 (the two floating point vectors) are loaded with a value 2.5, the length of the array is 100,000, the result in both cases should be 6.25e+10. The results I get are:
Results:
result_Conventional = 6.25e+10
result_Intrinsics = 5.36871e+08
This is not all. It seems that the value returned from the function that uses intrinsics "saturates" at the value above. I tried putting other values for the elements of the array and different sizes too. But it seems that any value above 1.0 for the array contents and any size above 1000 meets with the same value we see above.
Initially, I thought it might be because all operations within SSE are in floating point, but floating point should be able to store a number that is of the order of e+08.
I am trying to see where I could be going wrong but cannot seem to figure it out. I am using g++ version: g++ (GCC) 4.4.1 20090725 (Red Hat 4.4.1-2).
Any help on this is most welcome.
Thanks,
Sriram.
The problem that you are having is that while a float can store 6.25e+10, it only has a few significant digits of precision.
This means that when you are building a large number by adding lots of small numbers together a bit at a time, you reach a point where the smaller number is smaller than the lowest precision digit in the larger number so adding it up has no effect.
As to why you are not getting this behaviour in the non-intrinsic version, it is likely that result variable is being held in a register which uses a higher precision that the actual storage of a float so it is not being truncated to the precision of a float on every iteration of the loop. You would have to look at the generated assembler code to be sure.