Canonical Distribution of Kubernetes – Release 1.5.2 | Ubuntu Insights

We’re proud to announce support for Kubernetes 1.5.2 in the Canonical Distribution of Kubernetes. This is a pure upstream distribution of Kubernetes, designed to be easily deployable to public clouds, on-premises (ie vsphere, openstack), bare metal, and developer laptops. Kubernetes 1.5.2 is a patch release comprised of mostly bugfixes, and we encourage you to check […]

Source: Canonical Distribution of Kubernetes – Release 1.5.2 | Ubuntu Insights

 

Scientists must fight for the facts

On 21 January, one day after the inauguration of Donald Trump as the 45th president of the United States, millions of people took to the streets in protests across the country and around the world. The marches were spurred by Trump’s treatment of women, but the focus expanded to include issues ranging from apparent hostility towards environmental regulations to disregard for the truth. Many hoped that the sobering reality of entering the White House would transform Trump’s approach into something more conventionally presidential, but the early signs are not promising.

Trump’s inauguration speech was heavy on populist and nationalist rhetoric that, if carried out, would probably herald the end the United States’ leadership abroad. At home, he has put a freeze on hiring across the federal government, excluding the military and any positions related to national security and public safety. He also reiterated his plans to freeze regulations set in motion by his predecessor and to roll back pro-environment policies already in place.

Trump threw a bone to scientists with a pledge to explore space and to battle disease, but one of the first documents posted on the White House website was a bare-bones energy plan that emphasizes fossil-fuel development and makes no mention of the threat of climate change. The plan takes aim at “burdensome” environmental regulations and says that the Environmental Protection Agency should focus on protecting air and water, as opposed to the climate. Although it mentions — but does not define — “clean coal technology”, the plan ignores the struggling nuclear-energy sector as well as a burgeoning renewables industry that could provide countless jobs across the country in the coming decades.

Source: Scientists must fight for the facts : Nature News & Comment

 

Being a Data Scientist: My Experience and Toolset

If I had to use a few words to give myself a title for my position at UNC, I might not have said I was a data scientist. When I was starting my career there was no such thing, but looking at my CV / Resume, I have:

  • Worked at a billion dollar company, writing the integration process that pushed 40+ large datasets through complex models and analytics to produce one large modeled data product.
  • Done graduate work in text mining and data mining.
  • Wrote a innovative search engine from scratch and worked to commercialize it with two professors (it was their patent, but I was the programmer in the end).
  • Worked at UNC, Duke, and NC State University through Renci doing data mining, cartography, and interactive and static information visualization for various domain scientists.

I have done dozens of projects, and apparently I’ve amassed a fair bit of knowledge along the way that in some ways I have totally missed. Sometimes I answer a question and I think, “How did I know that, anyway?”

Well, yesterday I started mentoring at Thinkful in their Flexible Data Science Bootcamp, and I have to say that I love it already. I like their approach, because it blends 1-on-1 time with remote learning and goes out of its way to support its mentors in being good educators and not just experts.

But as I dig through my data science know-how I want to share it with more than just one student at a time, so this is the first in a series of posts about what it’s like to be a data scientist, or more accurately perhaps what I did as a data scientist and how that might relate to a new person doing data science in the field.

Some of it will include direct examples of doing data science projects in Python, and some of it will be more about the tools of the trade and how to work with open source tools to do data science. And some posts, like this one, will be more about “life as a data scientist,”

Source: Being a Data Scientist: My Experience and Toolset · Jefferson Heard

 

(1) Oren Jacob’s answer to Did Pixar accidentally delete Toy Story 2 during production? – Quora


Hi everyone, I’m the Oren Jacob in the video. Hopefully I can offer some first person color commentary about the video above that might serve to answer the questions here.  Note, after 20 years at the studio I left Pixar last year to start ToyTalk (www.toytalk.com), so this answer hasn’t gone through any PR filtering, it’s straight from my foggy memory of those events back in the late 90’s.

First, it wasn’t multiple terabytes of information. Neither all the rendered frames, nor all the data necessary to render those frames in animation, model, shaders, set, and lighting data files was that size back then.

A week prior to driving across the bridge in a last ditch attempt to recover the show (depicted pretty accurately in the video above) we had restored the film from backups within 48 hours of the /bin/rm -r -f *, run some validation tests, rendered frames, somehow got good pictures back and no errors, and invited the crew back to start working.  It took another several days of the entire crew working on that initial restoral to really understand that the restoral was, in fact, incomplete and corrupt.  Ack.  At that point, we sent everyone home again and had the come-to-Jesus meeting where we all collectively realized that our backup software wasn’t dishing up errors properly (a full disk situation was masking them, if my memory serves), our validation software also wasn’t dishing up errors properly (that was written very hastily, and without a clean state to start from, was missing several important error conditions), and several other factors were compounding our lack of concrete, verifiable information.

The only prospect then was to roll back about 2 months to the last full backup that we thought might work.  In that meeting, Galyn mentioned she might have a copy at her house.  So we went home to get that machine, and you can watch the video for how that went…

With Galyn’s machine now back in the building, we dupe’d that data immediately, then set about the task of trying to verify and validate this tree, which we thought might be about 2 weeks old.  We compared Galyn’s restoral with a much older one (from 2 months prior) and couldn’t determine a clear winner, there were too many inconsistencies.  So, instead, we set about the task of assembling what effectively amounted to a new source tree, by hand, one file at a time.  The total number of files involved was well into the six figures, but we’ll round down to 100,000 for the sake of the rest of this discussion to make the math easier.

We identified the files that hadn’t changed between the two, and took those straight away.  Then there were the files that were on Galyn’s but not on the older one; we took Galyn’s and assumed they were new.  Then there were files that were on the older one but not on Galyn’s; we put those in the “hand check” pile, since it is unusual for files to be deleted within a production source tree, and we were suspicious of those deletions.  Then there were the files that were different across the two backups, those also went into the “hand check” pile along with any files that were touched more recently than Galyn’s version.

Given that, we had something like 70,000 files that we felt good about, and we poured those into a new source tree.  For the remaining 30,000 files, it was all hands on deck.

We checked things across 3 partially complete, partially correct trees… the 2 month old full backup (A) , Galyn’s (B, which we thought was the best one), and another cobbled together tree (C) from the stray files left around from failed renders, backup directories on animator’s machines, some heads of source history that were left untouched, verbose test renders, and other random stuff we could find via NFS elsewhere in the building.

We invited a select few members of the crew back to work straight from Friday -> Monday morning.  We took rolling shifts to sleep and eat and kept plowing through, file by file, comparing each of the files in the “to be checked” list from A, B, and C, doing the best to verify and validate them, one at a time, by looking at them in xdiff.

In the end, human eyes scanned, read, understood, looked for weirdness, and made a decision on something like 30,000 files that weekend.

Having taken our best guesses at those suspect files, we assembled a new master of ToyStory2.  Many source histories were lost as a result, but we had the best version we could pull together.  We invited the crew back, and started working again.  Every shot went through a test render, and surprisingly, only a dozen or so failed.

I know full well that the following statement will likely blow people’s heads up, but the truth is that more than several percentage points of the show (as measured in numbers of files) were never recovered at all. So how could ToyStory2 work at all?  We don’t know.  The frames were rendering (other than that dozen shots), so we just carried on, fixed those shots, and charged ahead.  At that point, there was nothing more that could be done.

And then, some months later, Pixar rewrote the film from almost the ground up, and we made ToyStory2 again.  That rewritten film was the one you saw in theatres and that you can watch now on BluRay.
Source: (1) Oren Jacob’s answer to Did Pixar accidentally delete Toy Story 2 during production? – Quora

 

The fivethirtyeight R package

Andrew Flowers, quantitiative editor of FiveThirtyEight.com, announced at last weeks’ RStudio conference the availability of a new R package containing data and analyses from some of their data journalism features: the fivethirtyeight package. (Andrew’s talk isn’t yet online, but you can see him discuss several of these stories in his UseR!2016 presentation.) While not an official product of the FiveThirtyEight editorial team, it was developed by Albert Y. Kim, Chester Ismay and Jennifer Chunn under their guidance. Their motivation for producing the package was to provide a resource for teaching data science:

We are involved in statistics and data science education, in particular at the introductory undergraduate level. As such, we are always looking for data sets that balance being

  1. Rich enough to answer meaningful questions with, real enough to ensure that there is context, and realistic enough to convey to students that data as it exists “in the wild” often needs processing.
  2. Easily and quickly accessible to novices, so that we minimize the prerequisites to research.

Source: The fivethirtyeight R package