IT Anawer: Apr 13, 2011

Wednesday, April 13, 2011

How to determine the state of a process (i.e. if it is a zombie)

Hi,

how can I get information on the state of a process (i.e. if it is a zombie) using C under Linux?

After reading the answers so far I want to narrow my question somewhat: I would prefer a pure C solution. After reading the ps source (which reads /proc/) I thought that there should be a better way and asked here :)

From stackoverflow

I know only two ways:
- Parsing output of the ps command
- Reading files in /proc/PID, where PID is the process identifier (that's what ps does internally)
simplyharsh : i think you should clarify a bit

Found here:

Use this command to display all of your zombie processes:

ps aux | awk '{ print $8 " " $2 }' | grep -w Z

This could be easily parsed using C.

You want the processes running on your machine then use

$ ps aux

ps displays information about a selection of the active processes. If you want a repetitive update of the selection and the displayed information, use top instead.

simplyharsh : yeah i guess TOP is a good idea. just need to be parsed in C.

dmckee : I think "using C" means in a c program (i.e. not at the command prompt), and "under Linux" tells you what OS APIs he has access to.

aatifh : @dmckee hehe I know that dude. :)

aatifh : @taurean correct
You'll want to learn about interacting with the /proc/ "psuedo-filesystem" via typical C standard library calls. The documentation necessary to get started is included with any Linux distro and is a simple google search away.

(Now that you know what to search for. I know that's usually most of the challenge!)

In short, the directories and files within the /proc/ directory of a running Linux system reflect the state of the running kernel, which (naturally) includes processes. However, before you charge in you need to keep some information in mind.

A zombie process isn't the same thing as an orphaned process. An orphaned process is a process left running in a waiting state after the process' parent has exited incorrectly. A zombie process is a process which has exited properly, released all its resources, but is maintaining a place in the process table.

This typically happens when a process is launched by a program. You see, the kernel won't remove a finished sub-process' entry in the process table until the parent program properly fetches the return status of the sub-process. That makes sense; how else would the parent program know if the subprocess exited improperly?

So all subprocesses are technically zombies for at least a very short time. It's not inherently a bad state for a program to be in.

Indeed, "zombies" are sometimes created intentionally. For example, sometimes a zombie entry is left in place by a program for a while so that further launched processes won't get the same PID as the previously-launched (and now zombie) process.

In other words, if you go SIGCHLDing zombie processes unnecessarily you might create a serious problem for the spawning program. However, if a process has been a zombie for a half hour or more, it's probably a sign of a bug.

Edit: The question changed on me! No, there's no simpler way than how ps does it. If there was, it would have been integrated into ps a long time ago. The /proc files are the be-all-end-all source for information on the kernel's state. :)

dmckee : It's a nice discussion of the task pitched for a less sophisticated audience

Tilo Prütz : Your answer lead me to rethinking my approach of fixing a bug where the parent did not wait for the children properly. I reanalyzed the code and found the point where the parent missed to wait for the children. Thanks

Which is more efficient, PHP string functions or regex in PHP?

I'm writing PHP code to parse a string. It needs to be as fast as possible, so are regular expressions the way to go? I have a hunch that PHP string functions are more expensive, but it's just a guess. What's the truth?

Here's specifically what I need to do with the string:

Grab the first half (based on the third location of a substring "000000") and compare its hash to the next 20 bytes, throwing away anything left.

Parse the 9th byte through the next "000000" as one piece of data. Then grab the next 19 bytes after that, and split that into 8 (toss 1) and 8. Then I do some other stuff that converts those two 8 byte strings into dates.

So that's the kind of thing I need to do.

From stackoverflow

Native string functions are way faster. The benefit of regexp is that you can do pretty much anything with them.
I believe there is a threshold from which a regular expression is faster than a bunch of PHP string function calls. Anyway, depends a lot on what you're doing. You have to find out the balance.

Now that you edited your question. I'd use string functions for what you're trying to accomplish. strpos() and substr() is what comes to mind at a first glance.
I think if you want highest performance, you should avoid regex as it helps to minimize effort, but won't have the best performance as you can almost always adjust code using string routines to a specific problem and gain a big performance boost of it. But for simple parsing routines that can't be optimized much, you can still use regex as it won't make a big difference there.

EDIT: For this specific problem you posted I'd favorize string operations, but only because I wouldn't know how to do it in regex. This seems to be pretty straight-forward, except for the hash, so I think regex/string functions won't make a big difference.
It depends on your case: if you're trying to do something fairly basic (eg: search for a string, replace a substring with something else), then the regular string functions are the way to go. If you want to do something more complicated (eg: search for IP addresses), then the Regex functions are definitely a better choice.

I haven't profiled regexes so I can't say that they'll be faster at runtime, but I can tell you that the extra time spent hacking together the equivalent using the basic functions wouldn't be worth it.

Edit with the new information in the OP:

It sounds as though you actually need to do a number of small string operations here. Since each one individually is quite basic, and I doubt you'd be able to do all those steps (or even a couple of those steps) at one time using a regex, I'd go with the basic functions:

Grab the first half (based on the third location of a substring "000000") and compare its hash to the next 20 bytes, throwing away anything left.

Use: strpos() and substr()
Or : /$(.*?0{6}.*?0{6}.*?)0{6}/

Then grab the next 19 bytes after that, and split that into 8 (toss 1) and 8.

Use: substr() - (I assume you mean 17 bytes here -- 8 + 1 + 8)
```
$part1 = substr($myStr, $currPos, 8);
$part2 = substr($myStr, $currPos + 9, 8);
```
troelskn : Regexp are surprisingly efficient. You shouldn't generally be afraid of using them as the default tool.
Depends on your needs. Most regular expression operations are faster than one would think and can even outperform builtin string functions in certain trivial operations. Note that I have the preg library in mind, not the builtin regex library, which is quite slow.
If what you're doing is at all reasonable to do using string functions, you should use them. Like, if you're determining whether a constant string 'abc' occurs in $value, you definitely want to check strpos($value, 'abc') !== false, not preg_match('/abc/', $value). If you find yourself doing a lot of string reshuffling and transformations in order to accomplish what you would've with a regex, though, you're almost certainly going to wind up destroying both performance and maintainability.

When concerned about speed, though, when it comes down to it, don't think about it, clock it. The 'time' command is your friend.
In general, string functions are faster and regex functions are more flexible.

As with anything else, your results may vary, the only way to know for sure is to try it both ways and benchmark.

Non-static const member, can't use default assignment operator

A program I'm expanding uses std::pair<> a lot.

There is a point in my code at which the compiler throws a rather large:

Non-static const member, 'const Ptr<double, double> std::pair<const Ptr<double, double>, const double*>::first' can't use default assignment operator

I'm not really sure what this is referring to? Which methods are missing from the Ptr class?

The original call that causes this problem is as follows:

vector_of_connections.pushback(pair(Ptr<double,double>,WeightValue*));

Where it's putting an std::Pair<Ptr<double,double>, WeightValue*> onto a vector, where WeightValue* is a const variable from about 3 fuctions back, and the Ptr<double,double> is taken from an iterator that works over another vector.

For future reference, Ptr is a pointer to a Node object.

From stackoverflow

At least mention which object the compiler is complaining about. Most probably you are missing a custom assignment member. If you don't have one, the default one kicks in. Probably, you also have a const member in that class (whose objects are being assigned) and since a const member cannot be changed you hit that error.

Another approach: Since it's a class const, I suggest that you change it to a static const if that makes sense.
You have a case like this:
```
struct sample {
    int const a; // const!

    sample(int a):a(a) { }
};
```
Now, you use that in some context that requires sample to be assignable - possible in a container (like a map, vector or something else). This will fail, because the implicitly defined copy assignment operator does something along this line:
```
// pseudo code, for illustration
a = other.a;
```
But a is const!. You have to make it non-const. It doesn't hurt because as long as you don't change it, it's still logically const :) You could fix the problem by introducing a suitable operator= too, making the compiler not define one implicitly. But that's bad because you will not be able to change your const member. Thus, having an operator=, but still not assignable! (because the copy and the assigned value are not identical!):
```
    struct sample {
    int const a; // const!

    sample(int a):a(a) { }

    // bad!
    sample & operator=(sample const&) { }
};
```
However in your case, the apparent problem apparently lies within std::pair<A, B>. Remember that a std::map is sorted on the keys it contains. Because of that, you cannot change its keys, because that could easily render the state of a map invalid. Because of that, the following holds:
```
typedef std::map<A, B> map;
map::value_type <=> std::pair<A const, B>
```
That is, it forbids changing its keys that it contains! So if you do
```
*mymap.begin() = make_pair(anotherKey, anotherValue);
```
The map throws an error at you, because in the pair of some value stored in the map, the ::first member has a const qualified type!

As far as I can tell, someplace you have something like:

// for ease of reading 
typedef std::pair<const Ptr<double, double>, const double*> MyPair;

MyPair myPair = MAKEPAIR(.....);
myPair.first = .....;

Since the members of MyPair are const, you can't assign to them.

DataGridView Binding

I have a gridview that I am binding to via a generic list. I have set all the columns myself. I am just trying to:

Catch the event PRE format error when a row is edited- get the row information via a hidden field - and persist

I am sure this must be pretty easy but I haven't done much with forms work and I am unfamiliar with its DataGridViews Events.

From stackoverflow

There are two ways of looking at this;

handle the CellParsing event and parse the value
use a custom TypeConverter on the property

I usually prefer the latter, since it takes this logic away from the UI; I'll see if I can do an example...

Example (most of this code is the "show it working" code); here I define a MyDateTimeConverter, which formats/parses dates as their backwards "dd MMM yyyy" text (for no really good reason), and associate that converter with one of the properties. You can edit the values in the grid, and changes are pushed back in (change rows to see the "actual" value update). It doesn't show immediately because of some nuances around change-notification; it wasn't worth making the example more complex just for this...

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Globalization;
using System.Windows.Forms;

class Person
{
    public string Forename { get; set; }
    public string Surname { get; set; }

    [TypeConverter(typeof(MyDateTimeConverter))]
    public DateTime EditableValue { get { return ActualValue; } set { ActualValue = value; } }
    // this just proves what we have set...
    public DateTime ActualValue { get; private set; }
}
class MyDateTimeConverter : TypeConverter
{
    public override bool CanConvertFrom(ITypeDescriptorContext context, Type sourceType)
    {
        return sourceType == typeof(string) || base.CanConvertFrom(context, sourceType);
    }
    public override bool CanConvertTo(ITypeDescriptorContext context, Type destinationType)
    {
        return destinationType == typeof(string) || base.CanConvertTo(context, destinationType);
    }
    const string FORMAT = "dd MMM yyyy";
    public override object ConvertFrom(ITypeDescriptorContext context, System.Globalization.CultureInfo culture, object value)
    {
        if (value != null && value is string)
        {
            string s = (string)value;
            return DateTime.ParseExact(Reverse(s), FORMAT, CultureInfo.InvariantCulture);
        }
        return base.ConvertFrom(context, culture, value);
    }
    public override object ConvertTo(ITypeDescriptorContext context, CultureInfo culture, object value, Type destinationType)
    {
        if (destinationType == typeof(string))
        {
            return Reverse(((DateTime)value).ToString(FORMAT, CultureInfo.InvariantCulture));
        }
        return base.ConvertTo(context, culture, value, destinationType);
    }
    static string Reverse(string value)
    {
        char[] data = value.ToCharArray();
        Array.Reverse(data);
        return new string(data);
    }
}
class MyForm : Form
{
    public MyForm()
    {
        DataGridView grid = new DataGridView();
        grid.Dock = DockStyle.Fill;
        List<Person> people = new List<Person>();
        people.Add(new Person { Forename = "Fred", Surname = "Flintstone", EditableValue = DateTime.Today });
        people.Add(new Person { Forename = "Barney", Surname = "Rubble", EditableValue = DateTime.Today.AddDays(-25) });
        grid.DataSource = people;
        Controls.Add(grid);
    }
    static void Main()
    {
        Application.EnableVisualStyles();
        Application.Run(new MyForm());
    }
}

How is the Spring MVC spring:bind tag working and what are the meanings of status.expression and status.value ?

Let's discuss on the following example:

<spring:bind path="user.userName">
    <input type="text" name="${status.expression}" value="${status.value}"/>
    <span class="fieldError">${status.errorMessage}</span>
</spring:bind>

When this view snippet gets rendered, what do ${status.expression} and ${status.value} get evaluated to? Where do these values come from?

From stackoverflow

See this link for an explanation of what the status variables mean.
- status.expression: the expression that was used to retrieve the bean or property
- status.value: the actual value of the bean or property (transformed using registered PropertyEditors)
- status.errorMessages: an array of error messages, resulting from validation
The status object is evaluated when the binding is done.

Also have in mind that Spring 2.0 introduced new form tags, which are probable better suited for your needs.

Koraktor : Thanks for the reminder that the value of `status.value` is generated by the registered PropertyEditor. I implemented a custom editor for setting a property and omitted the `getAsText()` method. This will result in `"null"` strings!

What is the difference between for..in and for each..in in javascript?

What is the difference between for..in and for each..in statements in javascript? Are there subtle difference that I don't know of or is it the same and every browser has a different name for it?

From stackoverflow

"for each...in" iterates a specified variable over all values of the specified object's properties.

Example:
```
var sum = 0;
var obj = {prop1: 5, prop2: 13, prop3: 8};
for each (var item in obj) {
  sum += item;
}
print(sum); // prints "26", which is 5+13+8
```
Source

"for...in" iterates a specified variable over all properties of an object, in arbitrary order.

Example:
```
function show_props(obj, objName) {
   var result = "";
   for (var i in obj) {
      result += objName + "." + i + " = " + obj[i] + "\n";
   }
   return result;
}
```
Source

Vijay Dev : Is this browser specific ?

Christoph : @Vijay: yes - it was introduced in JavaScript 1.6, ie a Mozilla extension
Read the excellent MDC documentation.

The first is for normal looping over collections and arbitrarily over an object's properties.

A for...in loop does not iterate over built-in properties. These include all built-in methods of objects, such as String's indexOf method or Object's toString method. However, the loop will iterate over all user-defined properties (including any which overwrite built-in properties).

A for...in loop iterates over the properties of an object in an arbitrary order. If a property is modified in one iteration and then visited at a later time, the value exposed by the loop will be its value at that later time. A property which is deleted before it has been visited will not then be visited later. Properties added to the object over which iteration is occurring may either be visited or omitted from iteration. In general it is best not to add, modify, or remove properties from the object during iteration, other than the property currently being visited; there is no guarantee whether or not an added property will be visited, whether a modified property will be visited before or after it is modified, or whether a deleted property will be visited before it is deleted.

The latter allows you to loop over an object's properties.

Iterates a specified variable over all values of object's properties. For each distinct property, a specified statement is executed.

This demonstration should hopefully illustrate the difference.

var myObj = {
    a : 'A',
    b : 'B',
    c : 'C'
};
for each (x in myObj) {
    alert(x);        // "A", "B", "C"
}
for (x in myObj) {
    alert(x);        // "a", "b", "c"
    alert(myObj[x]); // "A", "B", "C"
}

In addition to the other answers, keep in mind that for each...in is not part of the ECMA standard and also isn't included in the upcoming edition 3.1. It was introduced in JavaScript 1.6, which is an extension of ECMAScript3 by the Mozilla Foundation.

According to the linked Wikipedia page, it's only implemented in Firefox 1.5+ and Safari 3.x(+?).

Crescent Fresh : In other words, it's "Firefox only".

Determine transfered data size for a web service call in .NET CF

Hi,

I'm developing a .NET CF client application and using web services for data transfer. I'm using SharpZipLib to compress transfered datasets so I know the size of the transfered byte array.

I wonder is there an easy way to determine to complete request size (html headerder, soap envelops and the real data) for a single call. I really want to minimize the GPRS connection costs.

Thanks...

From stackoverflow

Re the overall question; sorry, I don't know short of using a network tracer...

However; can I humbly propose that datasets and SOAP are not always the best choice on bandwidth restricted devices? Compression does a good job, but not always ideal. Unless you need the features offered, simpler protocols are available (such as POX, perhaps using inbuilt protocol compression (GZIP/Deflate)).

At the other end of things... if you can phrase things as messages, then serializers like protobuf-net might be useful (combined with raw binary posts); they are very data dense (such that attempts to use compression inevitably increases the size). However, you'd need to do your own data/change tracking at the client, and the RPC stack is as-yet incomplete (I've got working prototype code, but I haven't committed it yet, as I'm still unit testing it). The server would also be different (i.e. not an asmx or whatever - perhaps a rigged handler or MVC controller).

As another alternative - ADO.NET Data Services might be of interest, especially in JSON mode (for bandwidth, again using protocol compression).

xarux : You are right, may be web services is not a good choice but I don't think I have time to change all the structure. I also use db4o on client devices which supports server-client type communication but I'm not sure it support compression.
Wireshark is a famous protocol analyzer tool. However it may be an overkill for your needs.

Also checkout Fiddler. This is easier and it will allow you to monitor traffic from an emulator.

tcpmon is a Java utility that can sit between a server and a client. You need to edit the endpoint in your application to connect to tcpmon and configure tcpmon to proxy all requests to the actual web service. It shouldn't take more than 10 minutes - it is a very simple utility. Then you can monitor the raw requests in tcpmon or capture traffic with Fiddler.

xarux : I have used Fiddler before but never tried Wireshark. From WM Emulator which connects to internet through Activesync Fiddler doesn't capture the traffic. But thanks for reminding these two.
WCF supports message tracing which would let you see the size of the generated SOAP+Message. You could use these trace files to determine what you are looking for although with compression on your communications the bytes sent will be less obviously. For the actual on the wire size wireshark would be a good bet. Or you could zip the message pulled from the WCF trace and get a rough idea.

Arithmetic overflow error converting expression to data type datetime.

This select statement gives me the arithmetic error message:

SELECT CAST(FLOOR((CAST(LeftDate AS DECIMAL(12,5)))) AS DATETIME), LeftDate 
FROM Table
WHERE LeftDate > '2008-12-31'

While this one works:

SELECT CAST(FLOOR((CAST(LeftDate AS DECIMAL(12,5)))) AS DATETIME), LeftDate 
FROM Table
WHERE LeftDate < '2008-12-31'

Could there be something wrong with the data (I've checked for null values, and there are none)?

From stackoverflow

Found the problem to be when a date was set to 9999-12-31, probably to big for the decimal to handle. Changed from decimal to float, and every thing is working like a charm.
In general, converting a date to a numeric or string, to perform date operations on it, is highly inefficient. (The conversions are relatively intensive, as are string manipulations.) It is much better to stick to just date functions.

The example you give is (I believe) to strip away the time part of the DateTime, the following does that without the overhead of conversions...
```
DATEADD(DAY, DATEDIFF(DAY, 0, <mydate>), 0)
```
This should also avoid arithmentic overflows...

gbn : The only way to do it...

What does this mean in Prism/Unity: Container.Resolve<ShellPresenter>()

(from the StockTraderRIBootstrapper.cs file in the Prism V2 StockTrader example app)

What is the difference between this:

ShellPresenter presenter = new ShellPresenter();

and this:

ShellPresenter presenter = Container.Resolve<ShellPresenter>();

I understand the second example is treating the container like a factory, walking up to it saying "I need an instantiated object of type ShellPresenter".
But what if, e.g. I need to send parameters, what would be the equivalent of "new ShellPresenter(1, true)" etc.?
And since the Container has to be told about the ShellPresenter, I expected to find somewhere in the project a place where the ShellPresenter class being registered with the container, e.g. I was expecting

something like this:

Container.RegisterType<IShellPresenter, ShellPresenter>();

but found it nowhere. So how does the container get to know about these types so it can .Resolve them? I rebuilt this in its own project and get a "Resolution of the dependency failed" error, where do I need to register this dependency then?

Any direction/discussion here would be helpful.

Unexplained Answer:

So, in the bootstrapper, when I register the Shell itself:

protected override void ConfigureContainer()
{
    Container.RegisterType<IShellView, Shell>();
    base.ConfigureContainer();
}

then the Container can resolve the ShellPresenter type. So how is the ShellPresenter type registered when I register the Shell type?

The Surprising Answer:

Ok, so it turns out that you don't have to register the type you are trying to resolve but you do have to register the parameter (interface) types passed to the constructor of the type you are trying to resolve, i.e. since I inject the IShellView interface into my ShellPresenter's constructor, I needed to register the IShellView type and not the IShellPresenter type:

public ShellPresenter(IShellView view) ...

I tested this by trying to resolve the type Tester:

Tester tester = Container.Resolve<Tester>();

As long as I inject SomeClass into its constructor:

public Tester(ISomeClass someClass)

I get unresolved dependency errors until I register SomeClass with the container:

Container.RegisterType<ISomeClass, SomeClass>();

Then it works. This is as surprising as it is educational. Needs to sink in. I'm going to go get a coffee and think about this for awhile.

If anyone can elaborate on why this is the case, it would be much appreciated.

From stackoverflow

Well, I can't answer for Untiy, but for Castle Windsor, the registration could be in the app.config/web.config file. There is also the ability to add the parameters in the config xml.

This allows you to change the implementation and configuration of the object without having to recompile you application.

Edward Tanguay : interesting, ok, I couldn't find it in App.config though, I'm working through the StockTrader demo in the Prism V2 guidelines.
You understand the basics.

There are overloads for resolving types that require constructor arguments. Alternatively, you can always code your types to have a parameterless constructor.

The point of DI containers is that you can configure them to change the type that gets resolved for a particular interface without recompiling your software. The code example you provided for configuring the provider can't be changed at runtime. That's why most dependency injectors allow you to configure them in app.config/web.config/some other external configuration file. That way you can reconfigure your app to inject a different type without recompiling, which is the true power of DI frameworks like Unity.

Edward Tanguay : but where is the ShellPresenter being registered with the container? The only contact I can find between the two is that the ShellPresenter gets the container injected in its constructor. ShellPresenter is not in the app.config.

Will : Look, its not magic. In your example app, either its being registered in code or its being registered in a configuration file. If you haven't found it its because you're looking in the wrong place.

Will : Also, provide a link to the demo project; I can't find it anywhere.

Edward Tanguay : As soon as I register IShellView as a type with the container, the container is able to resolve ShellPresenter. This is because ShellPresenter gets IShellView injected in its constructor. See above. The odd thing is ShellPresenter is never registered yet I can resolve it.

Edward Tanguay : To get the StockTrader app running, you can follow my instructions here: http://www.tanguay.info/web/index.php?pg=books&id=23
In Unity, there is indeed a Container.RegisterType<TFrom, TTo>() method set that registers types at runtime. It's probably more common to do it using an XML configuration file, but either works.

Interestingly, in Unity there is no Container.Resolve<T>(params object[] parameters) -type method to resolve a type with specific constructor parameter values. Unity is built on top of ObjectBuilder which is the P&P team's library for doing object construction and wireup (IIRC it was originally written for ObjectSpaces but has been significantly enhanced now). ObjectBuilder gives you the ability to inject dependencies in various ways, including via the constructor, so you could say - for example - that you would pass a new instance of a type it's dependent on into the constructor of a resolved type; but that type would also have to be registered. You can also pass instances of registered types (a registered instance / singleton etc). But AFAICS there is no way to simply give it a value to pass in.

I think doing that would go against the philosophy of IoC to some extent, which is why they don't provide that facility. The container should, in theory, be able to give you a complete object graph in any given circumstance, so you should never have to pass parameters in, and making your objects dependent on constructor parameters other than injectable object dependencies (which the container will resolve for you) is seen as Bad Design.

I can't speak for Windsor, StructureMap or the others, which may allow you to do this. I can't even say categorically that Unity has no way to do it since I'm reasonably new at it, but IIRC Chris Tavares - who basically built Unity - hangs out here from time to time, so maybe he'll drop by and answer this one :-)
If you try to resolve a concrete class and have not registered an instance or sub-class to satisfy it, then Unity will construct an instance of the concrete class for you, resolving any dependencies that it has.

So when you ask for ShellPresenter, and haven't registered it, Unity just new's up a ShellPresenter for you with the ShellView as a parameter.

Metro Smurf : +1 - after reading this simple statement a few times, you've made quite a bit of sense!

Machine regularly segfaults

When I use apt-get to install or upgrade my Ubuntu hardy system, I often get messages like this:

$ sudo apt-get install foo
Reading package lists... Done
Segmentation faulty tree... 50%

$ sudo apt-get install foo
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Segmentation fault

$ sudo apt-get install foo
Reading package lists... Done
Building dependency tree       
Reading state information... Done
terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_S_construct NULL not valid
Aborted

cc  -Os -g -Wall -DLOCAL_ROOT=\"/usr/share/polipo/www/\" -DDISK_CACHE_ROOT=\"/var/cache/polipo/\"  -DCHUNK_SIZE=16384   -c -o http_parse.o http_parse.c
http_parse.c:1564: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
For Debian GNU/Linux specific bug reporting instructions,
see <URL:file:///usr/share/doc/gcc-4.2/README.Bugs>.
make: *** [http_parse.o] Error 1

It doesn't happen all the time, so if you retry enough you can usually get things done, but it does segfault an annoying large percentage of the time.

Any idea what's going on or how to fix it?

From stackoverflow

If standard tools are intermittently failing, particularly with memory errors like that, it's time to suspect the hardware.

Run memtest (from the grub menu is best), and leave it to run for several cycles.
Douglas gave the correct direction. Running memtest is the least expensive option.

Clocking down your CPU (if possible) is another option. If you have spare RAM modules lying around, try swapping them in for your current ones. If nothing works and you still have the same problem, suspect the mainboard.
I had a similar problem and it turns out that the CPU fan bracket cracked. It makes the temperature skyrocketed.

Also make sure your memory test passed. Using your ubuntu installation disk, boot up the machine and choose memory test. Let it run for about an hour, if there are memory problems, it will show up on the list of errors.
Software installation is probably taxing the system a bit more than "regular" use, which can cause inherent problems to creep out of hiding. In addition to the suggestions given, if you start testing/swapping hardware components, start with the power supply.

It might be the PSU that is "dipping" in voltage under load, which wreaks havoc with the system. Luckily, PSU:s are quite cheap, and it's far easier to swap out a PSU than a motherboard.

pi : I once had a problem with a weak PSU. The machine did a cold boot though when running under heavy load.

How to correctly boost results in Solr Dismax query

I have managed to build an index in Solr which I can search on keyword, produce facets, query facets etc. This is all working great. I have implemented my search using a dismax query so it searches predetermined fields.

However, my results are coming back sorted by score which appears to be calculated by keyword relevancy only. I would like to adjust the score where fields have pre-determined values. I think I can do this with boost query and boost functions but the documentation here:

http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3

Is not particularly helpful. I tried adding adding a bq argument to my search:

&bq=media:DVD^2

(yes, this is an index of films!) but I find when I start adding more and more:

&bq=media:DVD^2&bq=media:BLU-RAY^1.5

I find the negative results - e.g. films that are DVD but are not BLU-RAY get negatively affected in their score. In the end it all seems to even out and my score is as it was before i started boosting.

I must be doing this wrong and I wonder whether "boost function" comes in somewhere. Any ideas on how to correctly use boost?

From stackoverflow

It sounds like you need to apply the boost at index time instead of query time. So when you are preparing documents to be added to the index, give those that are DVD a boost of 2, and those that are Blu-Ray a boost of 1.5.
Apparently this is normal for films that are DVD but are not BLU-RAY get negatively affected in their score. This is because the more constraints you add the more the queryNorm value is reduced - and all scores are multiplied by this value.
This is a little late and it looks like you probably already have what you are looking for, but...

If you're curious about boost functions (which, judging by your desired results, I think you should be) you should check out the bf argument instead of the bq argument.

Try setting the bf argument to
```
media:DVD^2 media:BLU-RAY^1.5
```
and I think that could achieve what you want.

How do I display custom strings when multiple items are selected?

I have a property grid that helps me manage all of the controls on a form. These controls are for designer-type folks, so I'm not really worried that much about the user interface... until someone selects multiple objects.

I have a UITypeEditor for the "EffectiveDiameter" property on these common objects. It keeps track of units (meters vs feet) and does some nice things on-the-fly. However, when someone selects two or three common objects, EffectiveDiameter is blank, even though it evaluates to the same text string.

For example, in most controls, Microsoft has the "Anchor" property that has a text output of "Top, Right". When you pull it down it is an object with a nice UITypeEditor. Yet, when you select five objects on your form that all have the same Anchor setting you can still see the string "Top, Right" in the property grid.



/// <summary>
/// The default containing class for all Unit-Management Conversion classes.
/// </summary>
[
 Serializable,
 EditorAttribute(typeof(umConversionTypeEditor), typeof(UITypeEditor)),
 TypeConverter(typeof(umConversionTypeConverter)),
]
public class umConversion
{
    ...
}


public class umConversionTypeEditor : UITypeEditor
{
    ...
}



// Now, in my designer class, I have ...
private double _effectiveDiameter { get; set; }

[DisplayName("Effective Diameter")]
public virtual umConversion EffectiveDiameter
{
    get
    {
            umConversion ret = new umConversion (_effectiveDiameter);
            ret.MeasureInSI = _si;
            return ret;
        }
        set
        {
           _effectiveDiameter = value.ImperialUnits;
        }
    }
}

If I select several of my custom objects -- all with the same effective diameter -- how do I get EffectiveDiameter to display in the PropertyGrid like Anchor does? Right now, that field is always blank.

From stackoverflow

Place a TypeConverter attribute on each property too..

Jerry : I have a TypeConverter on the class itself. However, to ensure that there wasn't a trick to it, I did place a TypeConverter (that converts to/from string) on an attribute to see if it worked. It didn't. =(

leppie : Sorry :( Have you tried providing a common base class or interface, if possible? If not, you might have to revert to some magic with property descriptors and ensure you return the same instance (of the PropertyDescriptor) for all instances.
From your description it seems you are talking about the effectiveDiameter of type double. But in your sample you are showing a UITypeEditor for the umConversion class. So, please clarify.

If you are in the second case, where your property is actually a class of your own and not a primitive type, then ensure to implement the Equals method. The MultiPropertyDescriptorGridEntry class in the MS PropertyGrid calls Equals to know if all target instances have the same value. Not sure if this is your problem, but I would test that to eliminate this possibility.

Jerry : YOU ARE A GENIUS!!! I had a "Equals()" test, but there was a typo in it. THAT was my problem all along. THANK YOU SO VERY VERY MUCH!!!!

SetItemChecked(index,boolean) VS SetItemCheckState(index,Checkstate)

I just stumbled upon the SetItemChecked and SetItemCheckState when working on a checked listbox.

The SetItemChecked takes an listindex and True/false to either set the listitem to checked or unchecked.

The SetItemCheckState takes an listindex and a CheckState-enum to set the state to either checked, unchecked or Indeterminate.

Why does SetItemChecked exists, when SetItemCheckState does cover all states?

Is it just to make it easy to use a boolean because its the most common use to check/uncheck or has I missed something?

And if it just to make it easier to use a boolean, why not have that as an overload to the function?

Thanks.

From stackoverflow

I vote for

Is it just to make it easy to use a boolean because its the most common use to check/uncheck or has I missed something?

But the extra function will normally wont be used (only the Indeterminate case is added)

EDIT: CheckedListBox.SetItemCheckState(1, CheckState.Checked) is horrible to read (but clear) CheckedListBox.SetItemChecked(1, true) is easier to read
There is no real difference. SetItemCheck calls SetItemCheckedState like so:
```
public void SetItemChecked(int index, bool value)
{
    this.SetItemCheckState(index, value ? CheckState.Checked : CheckState.Unchecked);
}
```
Like you've already assumed, I guess it's just a helper-method to make things simpler for the developer.
It is because it is much more common to want to check/uncheck depending on a boolean value, and to avoid having to do this:

listbox.SetItemCheckState(i, value ? CheckState.Checked : CheckState.UnChecked)

all the time. Also, it is not an overload because is mirrors the Checked and CheckState properties of the CheckBox.

PHP and ?? operator

Like some people know, C# has very useful ?? operator which evaluates and returns expression on the right if expression on the left is null. It's very useful for providing default values, for example:

int spaces = readSetting("spaces") ?? 5;

If readSetting couldn't find "spaces" and returns null, variable spaces would hold default value of 5.

You could do almost the same in JavaScript and Ruby with || operator, as in

var spaces = readSetting("spaces") || 5;

although you couldn't have 0 as value of spaces in JavaScript in this case and false in both Ruby and JavaScript.

PHP has or operator and although it doesn't work as || in a sense that it doesn't return expression from the right, it still could be useful here:

$spaces = readSetting('spaces') or $spaces = 5;

with the note that "" and "0" are also treated like false in PHP in addition to false, 0 and null in most languages.

The question is, should I use the construction from above? Does it have side effects apart from treating large class of characters as false? And is there better construction that is usually used and recommended by PHP community this task?

From stackoverflow

If you want to guarantee that you get false or null, and not treat things like "0" as "false", you could do the following:
```
$spaces = readSetting('spaces');
if($spaces == null || $spaces === false) $spaces = 5;
```
The === operator in PHP looks for an identical match, so "0", "", won't equate to false. Obviously you might want to restyle the code to your liking, but you get the idea.
It is a good idea to be more explicit in cases like this, especially in PHP since it has somewhat confusing type conversion rules (e.g. "0" is false as you pointed out).

If you want to be strict, let your function readSettings return the correct setting or a well-defined value if no setting is found, e.g. null. Then you should use it as:
```
$spaces = readSettings('spaces');
if (null === $spaces) {
    $spaces = 5;
}
```
If you want to be more generous and only want $spaces to be non-empty, use:
```
$spaces = readSettings('spaces');
if (empty($spaces)) {    // or:  if (!$spaces) {
    $spaces = 5;
}
```
You could shorten it by the cost of a second function call or ugly constructs (not recommended):
```
$spaces = readSettings('space') ? readSettings('space') : 5;
$spaces = ($x = readSettings('space')) ? $x : 5;  // UGLY!
```
But watch out if you want 0 to be a valid value for $spaces!

From the Zen of Python:

Explicit is better than implicit.

In your case I would recommend to just add a second paramter $default to readSettings():
```
function readSettings($key, $default=null) {
    return isset($settings[$key]) ? $settings[$key] : $default;
}

$spaces = readSettings('spaces', 5);
```

How about this?

$spaces = ($tmp=readSettings('space')) ? $tmp : 5;

Or, more confusing but using only one variable:

$spaces = ($spaces=readSettings('space')) ? $spaces : 5;

PHP 5.3.0 gives a shorter version of the ternary operator condition?true:false which goes as follows:
```
$spaces = readSettings('spaces') ?: 5;
```
Please notice that PHP 5.3.0 is still in beta and not yet production ready (its a release candidate already though), but it gives alot of new cool stuff like lambda functions and namespaces too, so it's definitely worth to check the features out!

Here is an article that describes the new features pretty well:

http://www.sitepoint.com/article/whats-new-php-5-3/

therefromhere : I can't see any reference to ?: in the linked article.

How to undo the last checked checkbox?

Hi I created a javascript function to check if the number of modules selected is greater than a given value. So each time a checkbox is called the function is called, and the function goes through all the checkboxes and calculates the total to see if it's greater. But the problem is when the user checks the checkbox and if the total credits is greater than the value, I want to set the checkbox as checked=false. But I don't which checkbox to undo. Is there any undo last click function in javascript?

From stackoverflow

I know this isn't the answer you were looking for, but wouldn't it be a better user experience to disable all unchecked checkboxes when the maximum number of checks has been reached?

Seb : This is not a normal behavior for any user... it would not serve as an user-friendly experience.

Christoph : @SebaGR: why? displaying a message that informs the user that the total amount has been reached and then disabling all unchecked boxes and visually mark the appropriate labels sounds perfectly usable to me

Danita : I agree. You could extend the experience by calculating the amount remaining and disabling those checkboxes which if checked would add to more than the max amount. That way, if max=50 and you've selected 47, checkbox with value=5 would be disabled and checkbox with value=3 will still be available :)
I dont think there is, but you can always save a reference to the last checked box, so when you have to un-check it, you have it right there.

Seb : That is exactly what he's asking for! :P

Mg : Same answer you gave him, without the code you put there xD.
Not really, but you could fake it easily enough by saving the last box clicked. This code may not work verbatim, but you get the idea:
```
<script>
var last_checked_box;

function onBoxClicked( box ) {
   if ( box.checked ) last_checked_box = box;
}

function undoLastBox() {
   if ( last_checked_box ) last_checked_box.checked = false;
}
</script>

<input type="checkbox" id="box1" onClick="onBoxClicked(this)"/>
...
```
Seb : This would do the trick, but here you're missing the proper functions to calculate the upper limit. I'm giving you a complete solution in my answer :)

Christoph : @Eric: why save the boxes' id and not a reference to the box itself, ie `last_checked_box = box`?

Eric Petroelje : @SebaGR - Your answer is better for his particular situation, but somebody else who might find this question on a Google search might not have the same requirements. I also didn't want to assume he would be using jQuery.

Eric Petroelje : @Christoph - no particular reason, it was just the first thing I thought of :)

Christoph : @Eric: ok. it's just that doing it this way reminds me of telling the person standing next to you to call on your mobile if they want to speak to you ;)

Eric Petroelje : @Christoph - ok, ok, edited to remove unnecessary cruft :)

Using jQuery, you could do something like:

var MAX_CREDITS = 50; // just as an example
$("input[type=checkbox]").change(function (){
  var totalCredits = 0;
  $("input[type=checkbox]").each(function (){
    // augment totalCredits accordingly to each checkbox.
  });

  if(totalCredits > MAX_CREDITS){
    $(this).removeAttr("checked");
  }
});

If you've never used jQuery before, this surely is like a pain for your eyes; but as you can see, it's very powerful and your problem can be solved in few lines. I'd recommend you learning it and giving it a try ;)

I suggest either testing if the max number of objects has been checked and if true remove the check all in the same method or you set the value to some hidden text box to the id of your last checked check box so that you can reference it again.

Here's a minimal, but fully functional and jQuery-less solution:

<script>
function isBox(element) {
    return element.form && element.form === document.forms[0] &&
        element.nodeName.toLowerCase() === 'input' &&
        element.type === 'checkbox';
}

function remaining(dR) {
    var field = document.forms[0].elements[0],
        value = +field.value;

    if(dR) return field.value = value + dR;
    else return value;
}

function listener(event) {
    var box = (event && event.target) ||
        (window.event && window.event.srcElement);

    if(isBox(box)) {
        if(box.checked) {
            if(remaining() > 0)
                remaining(-1);
            else box.checked = false;
        }
        else remaining(+1);
    }
}

if(document.addEventListener)
    document.addEventListener('click', listener, false);
else if(document.attachEvent)
    document.attachEvent('onclick', listener);
</script>
<form>
 <p>remaining: <input type="text" readonly="readonly" value="2"></p>
 <p><input type="checkbox" id="b1"><label for="b1">box1</label></p>
 <p><input type="checkbox" id="b2"><label for="b2">box2</label></p>
 <p><input type="checkbox" id="b3"><label for="b3">box3</label></p>
</form>

Separating groups into nearly-equal stacks

I have a list of documents and I want to display them grouped by the first letter of their name on a web-page, over three columns.

In short, something like this:

A | C | E
A | D | F
B | D | F
B | D | F
  | D |

An important difference from say, the Windows Explorer view style is that I want letters to stay with each other. No breaking mid-group. To accommodate this, I don't care if one column is a couple of entries too tall.

I've started by sorting the array of documents by name and splitting them into a nested array. So I know (or can easily find out):

How many unique letters there are
How many letters there are in each group
How many entries there are in total
The mean average of how many values there should be in each column (ideally but not neccessarily)

I don't care what your answers come in. I'm looking for the algorithm rather than the implementation so you can code in anything you like (except perhaps Fortran). an explanation in HTML might be a toughie too.

I invite somebody to go wild on the tags because I couldn't think of anything relevant and no, this isn't homework, so please don't mark it as such.

From stackoverflow

Well, you can expect always to have some extra rows in each column. I mean, if you have 2 A's, 2 B's and 33 C's, then the third column would be pretty tall compared to others.

It's not the Knapsack problem because they have to be in order.

What you can do is:
- Count the number of items.
- See where the first third would fall.
- If it's exactly a letter change spot, then you're lucky :)
- If not, then minimize the distance between the third part split spot and the previous / next letter change spot - i.e. if there's a letter change 2 entries before and 10 entries after, then go for the previous one.
- Finally, take the rest, divide by two and follow the same logic to split as near as you can from the mean value.
leppie : It is more like the packing problem which is a deriviation of the knapsack problem, they all fall under combinatrics (or whatever you call that).
Perhaps it helps if you look at the problem like this:

For your example, you have a string like this:
```
AA BB C DDDD E FFF
```
The space positions are the places where you could start a new column. Everywhere else you mustn't to keep same letters in the same column. So you actually can mark the space position like this:
```
AA1BB2C3DDDD4E5FFF
```
Now you have 5 positions where you can either break the column or not, as it's a binary decision, use a string of 0's and 1's for this and brute force every possible combination:
```
12345

00000 -> no break at all, column count = 1, max. lines = 13
...
01010 -> your example, column count = 3, max. lines = 5
...
11111 -> breaks everywhere, column count = 6, max. lines = 4
```
This is a brute force attempt, but you can easily see that the 1's count affects the column count (column count = number of 1's + 1) and you want to minimize max. lines, this should be possible to somehow calculate without having to test each combination.

EDIT2: Didn't recognize you want 3 columns, this makes it easier as you know you will have only 3 1's, but it's still brute force.

EDIT: Another approach I'd favorize:

Write the letter counts like this:
```
A B C D E F
2 2 1 4 1 3
```
You can now join letters that are next to each other. Always chose the two letters with the lowest count sum:
```
2 2 1 4 1 3 - lowest = "2 1"
2  3  4 1 3 - lowest = "1 3"
2  3  4  4  - lowest = "2 3"
  5   4  4  - stop now, as we have 3 columns now

Result: AABBC, DDDD, EFFF
```
This perhaps won't lead to the optimal solution, but it's a nice and easy way to solve your problem, I think.

Oli : Your second solution looks beautiful on paper but I'm not sure how I'd do that programmatically. Looks like there'd be an ungodly amount of repetition.

schnaader : You could easily do it programmatically. Use a list or an array that contains the letter counts at first. Then calculate sum(i)=count(i)+count(i+1) for each item and join where sum(i) is lowest. Do this until you have only 3 columns left.

zweiterlinde : One problem with your (not terribly unattractive) greedy solution is that you can't handle tiebreaks. Consider the case where you only want two breaks and the sequence is 2 1 2 2. The optimal division is 3/4, but if you group 1 to the right you can't achieve it.

schnaader : Yes, as I said, it won't lead to the optimal solution for all cases, and 3/4 is much better than 5/2. In those cases you could try joining both of the tiebreak combinations instead of just one and take the best result at the end.

Oli : I've gone with the second approach. You were right - It's pretty simple to code out once you think about it for a minute. Not terribly efficient but that's not an issue where it's being used.

tvanfosson : Note that you're implicitly relaxing the "[allow] one column [to be a] couple entries too tall" constraint with this approach.
There is no general solution to this problem given your constraints unless the input is also bounded. Consider, for example, a collection with a single document starting with letters A, B, C, E, and F and 15 (or a million) documents starting with D. In order to group all of D in one column, the column length has to be at least 15. If you use more than two columns then at best column 1 will have a length of 3, column 2 will have a length of 15 (or a million), and column 3 a length of 2. This violates your "within a couple of entries" constraint.

You need to decide if the constraint on having columns not break on letters is important enough to warrant the potential large disparities between column sizes or if the inputs are constrained such that the problem may be solveable with the given constraints. Personally I would rethink the interface as solving an optimization problem just to keep the letters together seems like overkill.
I think you should start from defining a kind of "measure" which will tell you which layout is the best one, e.g. take the sum of the (average_size - actual_size(column))^2 for all columns. Then, because you always have 3 columns (is that right?) it should be reasonably fast to take all possible divisions and find the one maximising your measure.
First, make a pass over the documents to build an array of letter->count tuples.

The first entry is (first letter in array) -> document 0

Then find the entries which should appear in the second and third columns by walking through the array, adding the counts, but stopping just before you would pass the threshold for the 2nd and 3rd column (which is a 1/3 and 2/3 of the total count).

This problem lends itself to a recursive solution---possibly classic dynamic programming, although I haven't worked it out exactly.

You have a fixed number of potential split points, and a certain number of splits to make. You should be able to have something like

(splits, max_ht, min_ht) = split_list(list, requested_splits, 
                                      curr_max, curr_min)

The function should iterate over the potential split points and recursively call itself (with one less requested split) on the remainder of list. E.g.,

def split_list(list, requested_splits, curr_max, curr_min):
    best_splits = []
    best_split_len = curr_max-curr_min
    best_max = curr_max
    best_min = curr_min

    if requested_splits == 0:
        return (best_splits, curr_max, curr_min)
    for candidate in candidate_split_points:
        this_max = max(curr_max, len(<list up to split point>)
        this_min = min(curr_min, len(<list up to split point>)
        (splits, new_max, new_min) = split_list(<list after split point>,
                                                requested_splits-1,
                                                this_max, this_min)
        if (new_max-new_min) < best_split_len:
            best_split_len = new_max - new_min
            best_max = new_max
            best_min = new_min
            best_splits = [candidate] + splits
    return (best_splits, best_max, best_min)

Here's something you could try. Since you know your ideal column number (n), shove all your elements in the first column to start with.

Repeat the next steps as often as you see fit... its an iterative algorithm, so the results get better fast over the first few iterations, and then your returns start diminishing.

Run through the columns sequentially.

Let the number of items in the current column be numCurrent.

If numCurrent < n, skip this column.

Track the elements that start with the first letter (groupFirst), and the last letter(groupLast) of the current column.

Calculate the no of items of the previous column (if there is one) as numPrev. If abs(n-numCurrent) > abs(n-numPrev+groupFirst), move groupFirst to the previous column.

Recalculate numCurrent.

Like before, if there is a next column, shift groupLast into it if abs(n-numCurrent) > abs(n-numNext+groupLast).

Rinse and repeat. The more rinses, the neater it should look. There will be a point at which no more changes will be possible, and also points at which it can just keep going. You decide how many iterations.

Wednesday, April 13, 2011

Unexplained Answer:

The Surprising Answer:

Blog Archive