Mchr3k - Coding

Tuesday, 1 October 2013

Picasa Web Update

Good news! I can now download my photos from Picasa Web which were previously being truncated!

I am a little concerned that I seem to be getting back some smaller files than I expected. In one case I was able to download a 2.6MB file using Web Albums on my iPad but only get a 1.46MB file from Picasa Web. However, I don't actually mind much because I can't see a visual difference and I'm glad to at least have a file which contains the whole image.

So now I can go back to using Picasa Web right? In a previous post I mentioned that I was sorting out my pictures locally and then intending to re-upload everything. I heard that it was possible to store photos in Google Drive and have them show up in Google+ Photos. This worked great to view the photos in Google+ but they didn't show up in Web Albums on iOS.

So I tried uploading an album using Picasa 3 and this seems to work fine :) Now I have to decide whether to delete everything from Picasa Web and re-upload everything or just try and identify the albums which I have actually changed.

Saturday, 21 September 2013

Storing Photos In The Cloud And Offline

These days it is easier than ever to take lots of photos. The tricky part is how to sensibly manage all of these photos. I have an existing collection of ~20GB of photos and I take new photos on my iPhone.

I want to be able to:

Manually organize my photos into albums.
Access all my photos from my iOS devices (iPhone/iPad) and on the web - and the cloud storage shouldn't cost too much!
Store selected albums offline on my devices for fast access without waiting for each picture to download.
Easily sync photos/albums from the cloud down to my PC for the purposes of local backup.

Up until recently I was managing this by using Picasa Web as my cloud store and the excellent Pixite app "Web Albums" on my iPhone/iPad to handle uploads and local album caching. This worked great up until recently when a problem with getting my photos back out of Picasa Web caused me lose my trust in the service. This also made me realise that my current approach didn't include an easy way to sync all of my photos to my PC to protect me from cloud service issues.

Looking At The Alternatives
I have spent some time browsing through existing reviews of the alternatives and so far I haven't managed to find anything which quite fits my requirements.

Offline Caching
The easiest way to filter out much of the competition is to look at the support for offline caching within their iOS apps.

Pixite - developers of similar iOS apps for several cloud services which all support offline caching:

Picasa Web
Dropbox
Flickr

Everpix - automatically syncs some photos locally. No control over exactly which photos.

I also tried the following apps which didn't meet my requirements:

Google Cloud Drive - allows individual files to be cached for offline access. This is useless when I want to make whole albums available offline.
Amazon Cloud Photos - no support for organizing photos into albums.
No offline caching support:

SkyDrive
SugarSync
Picturelife
Loom

So the candidates at this point are (ignoring Picasa Web which I am trying to move away from):

Dropbox
Flickr
Everpix

Uploading Photos From My PC
I currently prefer to organize my photos into folders on my PC. Any service which I choose to use needs to make it easy to upload my existing collection of photos in folders.

Flickr quickly fails this test as the desktop uploader wants to upload a flat stream of photos rather than folders of photos. The closest Flickr equivalent of a folder is a "photoset" but these have to be created manually during upload - not exactly the easy sync I was hoping for! There are independently developed tools available but none of them seemed to offer exactly the kind of sync which I was looking for. I was also put off the Flickr service which seemed complex to setup for storing photos privately - it feels very similar to Facebook in that all of the defaults want you to share your photos!

Dropbox ticks the box for ease of upload as it has an excellent desktop sync client that will simply upload my existing folders of photos.

Everpix provides a desktop uploading client and allows manual re-download of photos. However photos can't be manually organized into albums. I also found that it took a few minutes for each uploaded photo to become accessible. Together these limitations make Everpix unusable for me.

Affordable Storage
Dropbox charges $100/year for 100GB.

I currently have a grandfathered Google storage plan - $5/year for 20GB. However if I want more space the closest normally available plan on Google is $5/month ($60/year) for 100GB.

It is worth mentioning that Amazon Cloud Drive actually has low prices at various capacities - 20GB for $10/year, 50GB for $26/year and 100GB for $52/year (all these prices are approximate as I have converted from GBP). What a pity that the Amazon Cloud Drive Photos iOS app is so poor.

What About iCloud?
As far as I'm concerned iCloud is a total failure for photos. iCloud is not a cloud photo store. iCloud Photo Stream is a cache of recent photos which makes it simple to push photos from the device where you take them to all of your other Apple devices.

Verdict
I already have a total of 36GB of storage with Google at a price I am very happy with. They even have a service (Picasa - becoming Google+ Photos) dedicated to photo management and a 3rd party app (Web Albums by Pixite) which provides offline caching. Unfortunately I just don't trust them to reliably store my photos. I am also frustrated to see Picasa Web redirecting to Google+ Photos which lacks simple features like sorting my albums by their names.

Google Drive would be a great alternative that uses the same storage but is run by a different team in Google that might do a better job with my files. Unfortunately there is currently no way for me to cache entire folders stored in Google Drive offline on my iPhone.

Dropbox seems to be the clear winner amongst the competition as it also provides an easy to use file syncing desktop app and allows me to continue to use a Pixite app on my iPhone to manage my uploads and local caching.

I'll give it a few weeks before I actually move all of my stuff. If Picasa starts working again in that time I might consider sticking with that service a bit longer and just manually download newer albums periodically to backup locally. It is also possible that Google Drive will add folder level caching at some point.

Tuesday, 17 September 2013

Why You Shouldn't Trust Picasa To Store Your Photos

tl;dr
A while ago I uploaded this picture to Picasa:

Recently I tried to download this photo and got this back from Picasa:

WTF!

Update: I Found A Workaround!
It turns out that Google Takeaway provides the ability to download individual albums/all photos from Picasa Web. I tried downloading one of my albums which appeared corrupted through Picasa Web/Picasa for Windows and it worked! All of the photos in the album were downloaded correctly! I will probably use this to extract all of my photos and move to another cloud photos service.

Update 2: I managed to extract all my photos using Google Takeaway. 17 photos failed to download and Google Takeaway generated an error report which included links that I could use to manually download the missing photos.

Update 3: This bug appears to be fixed now.

Background
For the last two and a half years I have been uploading all of my photos to Picasa. I also have four years of even older photos which I never got round to uploading to Picasa. I always felt guilty about this as I knew I was taking a risk - that my hard drive could fail and I would lose loads of photos. I actually have multiple local hard drives with copies of the photos but what if my computer was stolen or my house burnt down?

I decided to finally sort this out and get all of my photos into the cloud. I loaded up Picasa for Windows and got it to index all of my local photos. At the end of the process it offered to tag all the faces it had detected in my photos. This sounded like it could be useful but I figured it would be even better to download my more recent photos and tag everything at the same time.

So I ran an Import to start downloading my cloud albums. The first album which downloaded was very large with 316 photos and it was immediately obvious in Picasa that some of the photos weren't loaded properly. I took a look on disk and found that 37/316 photos had been corrupted in the way shown above.

That's ~12% corruption in a single album!

Investigating The Issue
I earn a living as a software engineer so I immediately set to work trying to come up with a meaningful set of symptoms to report to Picasa. This is what I found:

The corruption is probably just a truncation:

Every corrupted file was very close to (but not exactly) 2MB.
The missing data was always at the "bottom" of the image.

The original file data is probably not lost:

I can actually download uncorrupted individual files using Web Albums on my iPhone. I don't think these are coming from a local cache but I'm not certain.

The corruption isn't random:

The same files in an album are always truncated when downloaded.

It isn't related to the image contents:

If I re-upload an uncorrupted image (extracted using my iPhone) then it can be downloaded again without corruption.

Picasa might be doing some kind of backend migration:

I saw two filename formats within my albums:

IMAGE_[0-9]+.jpeg (e.g. IMAGE_141.jpeg)
IMAGE_[0-9A-Z]+-[0-9A-Z]+-[0-9A-Z]+-[0-9A-Z]+-[0-9A-Z]+.JPG (e.g. IMAGE_520FFAF1-2FF7-4199-82F4-5A88F5BA8076.JPG)

All of the corrupted images have the second filename format.

The corruption occurs regardless of whether I download using Picasa for Windows or Picasa Web (tested in IE/Chrome/Firefox).

Reporting The Issue
Time to report the issue! I headed over to Picasa support and took a look at my options which turn out to be pretty limited:

Picasa Google Groups - user-to-user support
Google+ Feedback - "Although we're not able to personally reply to you, we'll use your feedback to help us continue to improve the Google+ experience."

In other words there is no real way to raise an issue and get a reply.

My issue is with Picasa and not Google+ so I decided to try out the Picasa Google Group but got no reply.

What Now?
Frankly, I am hoping that writing this blog post might get the issue some attention. I am also curious to find out whether anyone else has hit this issue. Maybe I'm the only one?

It seems to me that the Google support model only works when every issue affects enough users to show up in some kind of top level metrics. If Google have really lost a load of my photos I will be really upset but maybe Google doesn't care if there really is only one of me.

The Competition
This issue has made me revisit my decision to use Picasa. The world of cloud photos has moved on in the last few years and maybe I would be better served by another service.

I currently pay Google $5/year on a grandfathered storage plan to get an extra 20GB/year for a total of 36GB. The cheapest up to date plan seems to be 100GB for $5/month.

I took a look at Flickr today and their service offering sounds pretty good - 1TB of storage for free in exchange for seeing some ads on their site sounds great. If I don't want ads I have to pay $50/year which is still cheaper than the cheapest Google plan. They even have a form where you can submit a message and they say they will actually send you an answer!

Tuesday, 23 April 2013

An Update On Finally Blocks

As I mentioned in my last three posts I have spent some time working on adding filters to the JaCoCo project. I've been meaning to provide a quick update on my progress for some time now.

Back in February I actually completed a rewrite of my finally dedup code. This rewrite dropped the dependency on line number information and instead relies on spotting duplicate bytecode that coincides with try/catch blocks. Unfortunately there turned out to be quite a few edge cases so the resulting code was fairly ugly even though it was in theory using a more sensible approach. I never got round to releasing this rewrite

Marchof, the original author of JaCoCo, started a playground project in February to invstigate how to structure filtering code in a neater way. This was mentioned in the JaCoCo developer mailing list.

I don't think anything further has happened on this since February but I am hoping to find some time to take another look at how to produce a maintainable finally block dedup implementation.

There certainly appears to be some interest in this work as I have been seeing ~60-80 downloads/week of the version of EclEmma which I publish on my own Eclipse update site ever since I first released it in January.

Monday, 14 January 2013

Improving Finally Block Handling

In my last post I described a heuristic for de-duplicating finally blocks within Java bytecode. Since then I have done some more work on this which is worth writing up.

Cleaning Up My Code

My initial implementation of the heuristic I described can be seen here. This code isn't very clear at all. My initial instinct was to improve this by adding better comments. However, I recently watched a talk which was based on a book called Clean Code which made a very interesting point:

Every Comment is a Failure (This isn't quite a hard and fast rule - comments are sometimes appropriate to express high level design decisions)

In light of this I decided to avoid comments and instead focus on improving the names of types, names of variables and names of functions. Here are some of the examples of changes which I made:

List<Instruction> => Sequence
List<List<Instruction>> => Cluster
List<List<Instruction>> cluster = findCluster(clusters, p); 
=> Cluster cluster = findClusterContainingInsn(clusters, p);

The full diff can be seen here. I think the result is much easier to read but I still need to improve or ideally strip out some of my existing comments. In some places I am still trying to use a comment to describe the following block of code which suggests I should actually be extracting out more methods.

Improving Finally Block Dedup Heuristic

My initial heuristic wasn't quite good enough. Consider the following simple Java code:

    try {
      System.out.println("A");
    } catch (RuntimeException ex) {
      System.out.println("B");
    } finally {
      System.out.println("C");
    }

This compiles to the following bytecode (with catch blocks and coverage probes annotated).

 0 getstatic #16 <java/lang/System.out>
 3 ldc #22 <A>
 5 invokevirtual #24 <java/io/PrintStream.println>
 8 goto 42 (+34)

catch (RuntimeException) => 11

11 astore_1
12 getstatic #16 <java/lang/System.out>
15 ldc #30 <B>
17 invokevirtual #24 <java/io/PrintStream.println>

catch (Anything) => 31

20 getstatic #16 <java/lang/System.out>
23 ldc #32 <C>
25 invokevirtual #24 <java/io/PrintStream.println>

coverageProbe[0] = true

28 goto 50 (+22)

31 astore_2
32 getstatic #16 <java/lang/System.out>
35 ldc #32 <C>
37 invokevirtual #24 <java/io/PrintStream.println>
40 aload_2

coverageProbe[1] = true

41 athrow

42 getstatic #16 <java/lang/System.out>
45 ldc #32 <C>
47 invokevirtual #24 <java/io/PrintStream.println>

coverageProbe[2] = true

50 return

Assuming no exception was thrown coverageProbe[2] would be hit naturally. The finally block dedup heuristic would also mark coverageProbe[1] and coverageProbe[0] as hit. Working backwards from these probes would lead the RuntimeException block to be marked as covered even though no RuntimeException has been thrown.

My solution is to treat copied probes differently. For these probes coverage is only propagated up until a line where the number of Sequence duplicates changes. This prevents coverage from a finally block leaking outside the finally block. This appears to achieve the desired result.

Wednesday, 9 January 2013

Java Bytecode - Finally Blocks

In my last post I talked about some new code coverage filters which I have added to my fork of the JaCoCo project. These filters are used at coverage analysis time when the recorded coverage data is analysed against the corresponding class bytecode. In the process of writing these filters I came across the grim truth behind Java finally blocks.

Finally Blocks

It turns out there is no explicit support for this Java language feature so in bytecode it's use has unfortunate consequences. Consider the following Java code:

try {
  try {
    System.out.println("A");
  } catch (RuntimeException ex) {
    System.out.println("B");
  } finally {
    System.out.println("C");
  }
} catch (RuntimeException ex) {
  System.out.println("D");
} finally {
  System.out.println("E");
}

This will end up producing the following bytecode (annotated with line numbers and exception handlers):

try{
try{
try{
try{
L3:
  0 getstatic <java/lang/System.out>
  3 ldc <A>
  5 invokevirtual <java/io/PrintStream.println>
  8 goto 42 (+34)

} catch (RuntimeException) => 11
L4:
 11 astore_1
L5:
 12 getstatic <java/lang/System.out>
 15 ldc <B>
 17 invokevirtual <java/io/PrintStream.println>
} catch (Anything) => 31
L7:
 20 getstatic <java/lang/System.out>
 23 ldc <C>
 25 invokevirtual <java/io/PrintStream.println>
 28 goto 84 (+56)

L6:
 31 astore_2
L7:
 32 getstatic <java/lang/System.out>
 35 ldc <C>
 37 invokevirtual <java/io/PrintStream.println>
L8:
 40 aload_2
 41 athrow

L7:
 42 getstatic <java/lang/System.out>
 45 ldc <C>
 47 invokevirtual <java/io/PrintStream.println>
L8:
 50 goto 84 (+34)

} catch (RuntimeException) => 53
L9:
 53 astore_1
L10:
 54 getstatic <java/lang/System.out>
 57 ldc <D>
 59 invokevirtual <java/io/PrintStream.println>
L12:
 62 getstatic <java/lang/System.out>

} catch (Anything) => 73
 65 ldc <E>
 67 invokevirtual <java/io/PrintStream.println>
 70 goto 92 (+22)

L11:
 73 astore_3
L12:
 74 getstatic <java/lang/System.out>
 77 ldc <E>
 79 invokevirtual <java/io/PrintStream.println>
L13:
 82 aload_3
 83 athrow

L12:
 84 getstatic <java/lang/System.out>
 87 ldc <E>
 89 invokevirtual <java/io/PrintStream.println>
L14:
 92 return

From a code coverage point of view this is a complete nightmare. For any branch within a finally block the number of actual branches in the bytecode is multiplied by the number of times that the finally block is duplicated (at least two).

Technically this is correct as to cover all the possible paths through the bytecode you must consider all of the possible exception paths. However, in practice the developer is likely only concerned with ensuring that all the branches in their finally blocks are covered at least once and don't want to worry about the possible paths into the finally block.

Implementing the filtering required to deduplicate coverage of finally blocks is made tricky by the fact that the start/end of the duplicate blocks are not explicitly annotated in any way in the bytecode. The closest I could get was to write a heuristic which works during the analysis of a method:

When a new line number marker is seen a new list of instructions is started.
Instructions are added to this list as they are seen. The instruction arguments are ignored.
After all the instructions in a method have been visited:

For each line number which was seen:

The instruction sequences seen with that line number are "clustered". Sequences which start with the same instruction are grouped together and assumed to be copies caused by the line being within a finally block.

For each covered probe:

Find the instruction sequence which contains the probe.
Mark the corresponding probe in every other sequence in the cluster as covered.

Run the existing probe coverage propagation.
For each cluster:

If the last instruction of any sequence in the cluster is covered, mark all other sequences in the cluster as covered.
Disable coverage reporting for all but one sequence.

This (relatively) simple heuristic is good enough to work in most cases. However it is straightforward to come up with code which will trip this up:

int i = 0;
for (i++; i < 10; i++) {
  System.out.println("A");
}

This produces the following bytecode:

L1:
  0 iconst_0
  1 istore_1
L2:
  2 iinc 1 by 1
  5 goto 19 (+14)
L3:
  8 getstatic #16 <java/lang/System.out>
 11 ldc #22 <A>
 13 invokevirtual #24 <java/io/PrintStream.println>
L2:
 16 iinc 1 by 1
 19 iload_1
 20 bipush 10
 22 if_icmplt 8 (-14)
L5:
 25 return

There are two blocks of code tagged as being on line 2 and they both start with the same instruction. With my heuristic in place one of these blocks will be ignored as they are assumed to be duplicates. This is a shame but in practice I have yet to think of an example of realistic code which would be matched by my heuristic.

Conclusions

I'm pleased that my heuristic works as well as it does but I would love to come up with a better mechanism. If you have any suggestions I would love to hear them!

Java Code Coverage: Filtering out the noise

Code Coverage is a measure of how much of your code has been executed. This is usually measured while running unit tests as a way of tracking how much production code is being exercised. It is important to note that high code coverage doesn't give any guarantee that the behaviour of the covered code is actually being asserted on by your unit tests but it is a useful indicative measure.

However, there are a couple of serious drawbacks with most simple code coverage tools.

Implicit Code

Java source code is compiled to bytecode before being executed. The most common mechanism for tracking Java code coverage is to instrument this bytecode. The problem with this approach is that it can lead to reporting of coverage (or lack of) for elements which are not present in the actual Java source code. This is frustrating as it either lowers the overall code coverage figure or forces a developer to write pointless extra tests.

Examples:

Classes without an explicit constructor have an implicit no-args constructor.
Enum classes have implicit values() and valueOf(String) methods.
Synchronized(object) {} blocks are implemented in bytecode by duplicating the instructions for releasing the lock inside and outside of a catch block. This would naively require an exception to be thrown from within every synchronized block to achieve full coverage.
All finally {} blocks are implemented in bytecode by duplicating blocks of instructions possibly several times. This would naively require every branch within the finally block to be exercised for every duplicate.

Uncovered Code

It is commonly acknowledged that 100% code coverage is an expensive goal to attempt to reach. This is usually because getting the final 5-10% code coverage involves writing pointless extra tests that add little real value and can require production code to be distorted to support the pointless extra tests.

JaCoCo

JaCoCo is an open source Java Code Coverage tool which is in active development. It is a mature project and since late 2011 has been the library behind the popular Eclipse plugin EclEmma.

Over the last few months I have been working to add some key features to this library to address the concerns discussed above.

I have updated JaCoCo to automatically filter out coverage of all of the following:

Implicit no-args constructors.
Implicit enum methods.
Synchronization exit handling.
Finally block code duplication.

I have also added support for source directives that allow blocks of code to be explicitly excluded from code coverage reports. This addresses the issue of uncovered code and allows developers to explicitly mark blocks of code which are deliberately not being covered by unit tests.

All of these changes are ready to try out now by downloading the release of my JaCoCo fork from the link below.

I am working with the JaCoCo developers to get my changes accepted back into the core JaCoCo project.

More Information

http://mchr3k.github.com/jacoco/