For entrepreneurs aiming to solve humanity’s communication problems, here’s some free advice: give people communication skills, not another way to distribute their half-baked thoughts. We’ve got plenty of ways to send text around the internet. But communication skills – they haven’t improved much since the proliferation of the internet (if at all).
The problem is that it’s not easy to monetize someone else’s communication skills (who will fund you?). Aside, of course, from hiring them to solve some pressing communication problem. But we can’t all work in PR.
It’s easy to see how we got 140 characters. Advertising. But where are our flying cars? You can barely spend 10 minutes reading or talking about startups and technology without a reference to how we still don’t have flying cars.
We wanted flying cars, instead we got 140 characters
I appreciate the sentiment, but the problems we really need to solve to get flying cars are economic ones. Flight uses a lot of energy. Energy is expensive. The real obstacle is the cost of flying cars – we’ve already figured out air travel – it just costs too much to be practical for day-to-day use.
I have a hard time imagining the founders of twitter thinking to themselves, “Let me go work in BP’s research division so we can eventually have flying cars.” But that’s the reality of it. That or find a way to break the laws of physics, but the prospects aren’t looking good for that. There’s a Nobel Prize waiting for you if you succeed on that front though.
“Let’s focus on the hard problems.” Yes, let’s! But let’s also be honest with ourselves: the hard problems we need to solve are in energy, agriculture, and biology. And hey! What do you know‽ We, as a global society, are spending a lot of resources in those areas. The problems are just really hard.
I don’t mean to say everything is fine or that there aren’t other things to improve (as a species, I disdain nationalism), but I think the problem is overstated.
It’s great that SpaceX is trying to get us on Mars, but again, a huge factor in why there isn’t much investment in that area is cost. Before any number of people are going to go off and colonize mars, we need to reduce the cost to a level that is affordable at scale.
Musk understands this well:
Elon states that he wants to make a trip to the Red Planet affordable for an average American Family. Affordable, he later said, is “no more than half a million dollars”.
Because as long as it costs millions (billions?) of dollars – which we could otherwise spend saving lives here on earth – a mission to colonize Mars is going to be unpalatable. Same for those fancy flying machines you’re yearning for.
Turns out this is a side effect of things going badly with Top Sites.
You can disable Top Sites all together:
Go to “Safari” > “Preferences” > “General” and change new windows and new tabs to open with a blank page (or anything besides Top Sites)
Try to figure out which of your top sites is causing the issue. I suspect that if there is one with no preview (having the black background with a grey safari icon overlaid instead) it is probably the culprit, but I can’t verify that. [update]: I have confirmed this to be the case.
Safari has to run and render all code in the website in order to generate the preview. The more code it has to run, the more potential for problems.
Anyway hope this helps some poor person with this problem :)
edit: I’ve just discovered that if you hover over the “Safari Web Content” item in Activity Monitor, the hover text will show the url of the page being rendered!
edit 2: The trick mentioned in the previous edit is nice, but doesn’t work on the Safari Web Content process that hangs, only the ones spawned for browser tabs. Doh!
I’ve been reading Peter Harrington’s “Machine Learning in Action,” and it’s packed with useful stuff! However, while providing a large number of ML (machine learning) algorithms and sufficient example code to learn how they work, the book is a bit dry.
So I’ve decided to make my contribution to democratizing ML by posting simple explanations of these algorithms.
Pure Python isn’t the most (computationally) efficient way to implement these algorithms, but that isn’t the purpose here. The goal is to help humans understand how these algorithms work. Python is great for that. That’s why the book uses Python as well.
But Harrington takes the alternate route of using the (very powerful) numpy from the get-go, which is more performant, but much less clear, at the expense of the reader.
Well that’s crap; let’s start learning!
What is KNN (K nearest neighbor) good for?
This is a good question to answer up front. Why are we doing this in the first place?
KNN is a “classifier”, which is a type of algorithm that (you guessed it) classifies things.
Let’s put it in more concrete terms: We want to teach the computer to answer the question, “What kind of fruit is this?”
You’re the owner of an orchard, and you’re tired of paying workers to sort your fruits on the assembly line. The job is boring, the workers hate it, and you already measure the weight and color of every fruit on the line anyway. It should be simple enough to have a machine do it.
You have a set of already classified (categorized, tagged, etc) information - and you want to automatically figure out where new data (fruits) fits into your classification automatically. i.e., Is it an Apple or a Banana?
Here’s some fruit the workers logged before they got shit-canned:
Notice they assigned numbers to the colors, that’s useful because we need to do math with these values (numbering non-numerical stuff is known as discretizing). The colors are in order of the color wheel, so similar colors are closer together than less similar colors.
Here’s the color key from the foreman’s clipboard:
So our data set has some apples which are red, green, and yellow, and a bunch of bananas which are all yellow except one that is green.
It’s 9 AM.
A loud bell rings.
The conveyor belt starts turning, and fruit starts flowing in from outside.
At this point, which classification the unknown fruit belongs to is determined by taking a vote of the “k“ nearest neighbors – so if “k“ is 3, then we take the top 3 fruits by distance and select whichever is most common.
In this case we see that the top 3 are all “Apple“ so we conclude this unknown fruit must be an apple.
You can expand this to more more than two features though. You can actually use that distance formula from earlier with as many dimensions as you want.
Let’s try it with 4:
So if we’d done this using more characteristics of the fruits than just weight and color (like number of seeds in the fruit for instance), the distance calculation (we have 3 factors now) would have just been:
You: This is repetitive
True. This code is designed to make it easy to understand… in real life, you should use numpy (or similar) for performance reasons anyway (ML is very computationally expensive).
You: What if one factor is more important than the others?
That’s a really good point. Maybe the number of seeds is much more important than the color of the fruit (it is), but color is still an important differentiator among fruits with the same number of seeds?
Neutralizing the effects of different units
Right now your weight values are much bigger than our color ones, which we’ve discretized to single digit numbers.
That means weight is causing much bigger changes in distance between fruits than color is.
What are we going to do about that?
Well, what if we measure all our inputs on a scale of 0 - 1.0?
That’s Normalization Kyle!
In short, we’re going to take the biggest value in the dataset and the smallest value in the dataset and put all the other numbers on a scale of 0.0 - 1.0 from smallest to biggest.
After you’ve classified the 3 Unknown Fruits, consider which columns you could remove without losing any accuracy. It’s often the case that simpler classifiers are better, and the facets of your data may not be as related as you originally thought!
This is my first crack at this type of tutorial, so please give me feedback, and/or corrections! (email: email@example.com )
I'm sitting on a train, making my 45 minute, 3 mile commute.
And by train, I mean: tiny aluminum can filled to capacity with iPhones and their owners.
The alarming speed and sheer mass of concrete above our heads isn't getting any attention from these nerds.
Because they're too busy looking at iPhones.
But wait a minute – not their own iPhones. A lot of them are looking at somebody else's. In fact, I'd say in a given subway ride at least half of them will glance at their neighbor's display. You know you've done it. Moving, flashing lights are hard to ignore.
Eavesdropping. iVesdropping? Heh. I love a good pun.
Let's talk about pervasive Internet. That idea mobile developers keep spouting about how we have Internet access "everywhere" thanks to our iThings. What shit.
I spend an 90 minutes a day using an iPhone with no Internet. That's very possibly the majority of my phone usage. 5 days a week.
And I'm not the only one.
Most of the iVesdropping I see is people watching somebody else play a game.
I think it's because games are immersive and the device owner is least likely to look up and trigger that awkward moment where you both realize just how long you've been snooping.
Speculations aside, this is not going away. And I know I've searched the App Store on more than one occasion for an app I saw in that sardine can.
Guess which apps I never see down there. Words with friends, song pop, facebook, twitter, buffer.
All those social ones that demand network access.
But Mail works, so does Podcasts, and Reeder, and letterpress (sort of).
And I know that not everyone lives in the city. But cities are cultural centers. Getting big in New York or San-fran can catapult an app into the charts, and the visibility of being in the charts can make or break your sales numbers.
Dear app developers, I'm begging you. Please make your apps work offline. At the very least make sure they don't crash when you launch them without internet (I'm looking at you zynga).
Have you ever tried to sit down and come up with a good idea?
It's really hard.
After long, painful hours you still have a blank sheet of paper. Not a single good idea written down. Sound familiar?
I think we've all been there. If you've ever been a student you know exactly what I mean.
The craziest part is that during those excruciatingly long minutes, you probably weren't even thinking of ideas the whole time. An idea comes into your mind, you consider it, decide it's not the one and then you try to think of something else.
But then you get distracted.
You'd be appalled how much time distractions take out of the process when you do “brainstorming” this way.
I've been in this situation many, many times, and I finally realized something that totally changed the way I approach creative thinking:
Coming up with 1 good idea is actually harder than coming up with 10 good ideas.
It sounds crazy, but I'm about to convince you it's true ;)
Being prolific when you're brainstorming is absolutely key to finding good ideas.
In part, because as humans, we're not very good at focusing our thoughts when the rest of our body is idle. But also because we're notoriously bad at identifying which ideas are good, and which ones aren't.
The trick is to accept failure from the start.
Most of your ideas will suck, but that's fine. Just write them all down. Every. Single. Stinking. Idea.
Don't stop until you have at least 15 (and hopefully 30, 50, or 100). Every idea you write down is a ticket in the Good-Idea Lottery.
Obviously this exercise leaves you with a list of dozens of ideas which you now have to vet, but (and here's a third reason to do it my way):
Vetting ideas and coming up with ideas are different frames of mind.
When you're coming up with ideas you want to be open minded, creative, and optimistic. When you're vetting ideas you want to be analytical, realistic, and tactical. Trying to do both is hard – yet another fault of our species – we suck at multi-tasking.
So how do you vet all these ideas?
Well, I thought about this for a while and the answer is really: "it depends" as it always seems to be (nod to startups for the rest of us).
But in most cases (all?) you are well served by soliciting opinions from others.
So do that.
If it's artwork, show it to people with good taste.
If it's a business idea, show it to some would-be customers.
At first I was just keeping hand written lists of people's feedback. Then it was excel spreadsheets.
Business ideas, marketing plans, songs, resume designs… all kinds of stuff.
I was really just collecting anecdotes – I'd ask 3 people here, 5 people there – sometimes that's good enough. It's certainly a good place to start.
As I continued my education I learned about statistical significance, and other fun things and the urge to make these decisions based on (more) data resurged.
I was an economics student, and a closet hacker.
There are a lot of ways to gather data, but I didn't use any of them. I did want any naïve hacker would: I built myself a tool that took all these unwashed ideas and ranked them.
And I called it the Whicher. (this may sound familiar if you follow me on twitter, facebook, etc).
Essentially, you put in a question, and 20 images and it shows people 2 at a time until they've gone through all of them.
You're probably wondering at this point, "why not just ask people to rank the ideas instead of this convoluted tournament thing?"
That was my first plan.
Let's rewind. It's story time…
My band was making a CD. There were 6 of us which meant lots of arguments about how much violin should be in the chorus of song X, and whether or not it was a good idea to cover songs our friends had written.
One day we're all sitting around in the "Recording Studio" (e.g., my parent's formal living room) trying to pick which 10 of our 30ish songs deserved to be included in the album.
We took ourselves very seriously. Don't laugh, this is important.
So Matt (drummer) proposes that we all make a top 10 list and then we'll all compare. 10 minutes of scribbling, crossing out, eraser dust, new sheets of paper, and general kindergarden activity ensues.
When Matt read my list his reaction was, "really? you like Song A more than Song B?". And answer was, "No." I didn't.
That's the funny thing about rankings. You can get circular logic: Song A beats Song B, Song B beats Song C, but Song C beats Song A.