One must have chaos to be able to give birth to a dancing star

“I am a disciple of the prophet Dionysus ” — Friedrich Nietzsche

Information Technology recommendations on data structures: the more structured, the more clean and standardized, the better. It increases productivity, makes everything more organized. Standards diminish rework and confusion with data. Excel spreadsheets are the worst thing in the world, because the user will create columns, insert empty lines, change the header: it will be a mess.

This discourse seems perfect.

But, also as Dionysus’ disciple, I say NO.

  • The more chaotic the database, the better.
  • The dirtier, the better.
  • The more the user messes up, the better.

Apollo x Dionysus

The German philosopher Friedrich Nietzsche had the remarkable ability to combine philosophical ideas with a poetic style. One of his most famous metaphors is the contrast between the Greek gods Apollo and Dionysus.

Apollo is the god of the Sun, Harmony, Medicine. He represents the Order: the beautiful god, tall, strong, symmetrical, organized. He is the god of the arts, giving precise shapes to the sculptures, creating order in chaos.

Dionysus is the god of wine. He represents the Chaos. Fat, short, ugly, drunk, crooked, everything in him is bad. He is the ecstasy, the drunkenness. Born of hunger and pain, he reborns each spring and spreads joy wherever he goes.


Order x Chaos

The order seems better than the chaos. But the fact is that we can bring order only to a very small part of the world. The universe, infinitely greater, will never be known by humans.

I imagine the two gods as two arms: chaos and order, complementing each other. We should have the humility to recognize that there are premises that will always be out of any model .

I have a very strong Apollonian side, by my background in engineering, math Olympiads, etc. But I also have a very strong Dionysian side, which makes me very skeptical of everything that wants to organize too much, optimize too much, giving little room to the unforeseen.


Why I like chaotic data structures?

I work extensively with innovation, creating new tools, new processes, new ideas.

When working with a new project, the client don’t have the slightest idea of ​​what he wants. He knows only the symptom (say, analysts lose too much time to generate the report), and assumes that he knows the solution (automatically generate a report).

But is this the real problem? No one knows, this must be discovered.

Sometimes, he even does not need the report he thought he needed. Anyway, in 100% of cases, it’s useless to structure databases to try to solve efficiently the wrong problem.

As Peter Drucker says:

There is no greater waste than to solve with great efficiency a problem that didn’t needed to be solved.

The recommendation of structured databases is only for mature processes, those already established and that will change little.

For innovations, the more prototypes, the better. The more quick-wins, simple, fast, flexible and therefore chaotic, the better. The dirtier the excel, the better. The more the client messes up, the better.

No new music arises from the order. No new picture emerges of order. No new idea arises from the order. Only from chaos.

One must have chaos to be able to give birth to a dancing star” — Friedrich Nietzsche

More forgotten lore at: https://medium.com/@arnaldogunzi

Portuguese version: https://ideiasesquecidas.com

An abnormal man looking for an interesting number

After taking the picture below, from the building written “normal”, the abnormal guy of the photo — me — remembered an “interesting” paradox.

Suppose I list the natural numbers in order:

  • 1
  • 2
  • 3
  • 4

Now let’s say something interesting about each of these numbers:

  • 1 is the first number of all, it is divisor of all the others
  • 2 is the first and only prime number
  • 3 is the first odd cousin
  • 4 is the first perfect square

Let’s say the numbers with the an interesting property are called “interesting” numbers.
And the numbers that are not interesting are the “normal” numbers.

Using this definition, a list would look like this:

  • 1 is an interesting number
  • 2 is an interesting number
  • 3 is an interesting number
  • 4 is an interesting number

Now suppose the number x is the first “normal” number in the list.

  • 1 is an interesting number
  • 2 is an interesting number
  • 3 is an interesting number
  • 4 is an interesting number
  • x is a normal number

But if x is the first “normal” number, it is an “interesting” number because it has an interesting property: to be the first “normal” number.

On the other hand, if we consider x an “interesting” number by having the property of being the first “normal” number, it is no longer a “normal” number and now it is an “interesting” number, this way losing the property of being the first “Normal” and then ceasing to be “interesting” …

What a mess! It’s not “interesting”?

To tell you something interesting, this problem is the “Paradox of Richard’s Numbers,” described by the mathematician Jules Richard in 1905.


This link (https://en.wikipedia.org/wiki/Richard%27s_paradox) tells more details about Richard’s paradox, but in a less interesting way than here.

Another similar paradox is the “Liar Paradox”. A man who only tells lies says “I’m lying.” But as he only lies, he will be telling the truth in this statement. But if he speaks the truth, he is not the one who only tells lies.

These paradoxes “bugs” not only the minds of ordinary human beings, but also the minds of the greatest mathematicians in history.

The Austrian mathematician Kurt Godel demolished the foundations of all mathematics in 1931, with its Incompleteness Theorems, by proving that mathematics can not at the same time be Complete and Consistent. That is, mathematics has limits. Godel found a “bug” in the foundations of mathematics — it can not at the same time get rid of these bizarre paradoxes and answer True or False to all its propositions. Godel used a sophisticated version of Richard Paradox to prove it.

It’s a long story, which involves mind giants like David Hilbert and Bertrand Russell, and it’s for another day.

By the way, I think the author of the building light wrote “normal” in an “interesting” way only for the building not to be “normal” anymore, and thus to confuse our head …

 

OpenSolver Example 1

This is a very easy example, inspired on Forest Planning.

Suppose we have two paper mills, ABC01 and ABC02. Each has a demand of wood in a given period.

os01.JPG

There are stands (areas of forest) that will be harvested. Each has a volume of wood. Let the index of stands be i.

os2.JPG

The distance between each stand (i) and paper mill (j) is given.

os03.JPG

How to supply the paper mills (supply >= demand), with the objective function of minimization of the Distance?

 


General rules

Because it is a linear problem, we can use only linear functions like “sumproduct” and “sum”. “Sumif” or “vlookup” are non linear.

There are a lot of advantages in modelling as a linear problem. We can use all the power of linear solvers like CBC. We have the guarantee of an optimal solution. Besides, the solution time will be faster than in the non linear case.

The model is very similar to solver.


Solution

We define a binary variable, aloc(i,j).

os04.JPG

The volume for each destination is aloc(i,j)*volume(i)

os05.JPG

It must be greater than demand(j)
each j, sum(i, aloc(i,j)*volume(i)) >= demand(j)

There is a need to impose a structural constraint: each stand must be destined to only one paper mill.
each i, sum(j, aloc(i,j)) <=1  (note isn’t an obligation to harvest the stand)

os06.JPG
The objective function is minimize aloc(i,j)*distance(i,j)

os07.JPG
Once the Excel file is ready, just set the Model in opensolver.

os08.JPG

Set the objective function, each constraint and save the model.

os09.JPG
Press solve, and it will give the solution.

os10.JPG
OpenSolver shows the formulation, with the colored boxes. The “Show/Hide Model” makes it easy to hide the formulation.

 

os08.JPG

It’s very easy, and similar to Solver. And also powerful, as we saw in Introduction.

Arnaldo Gunzi


 

The Excel file for download is here.

 


 

Read also

OpenSolver – Introduction

 

Tutorial Open Solver – Introduction

Solver versus OpenSolver

Perhaps the easiest and most famous tool for linear programming is Microsoft Excel Solver. It is the tool people know from college, or some friend talked about, along with the infamous and magical “goal seek”.

Excel solver is quite intuitive, easy and ubiquitous (since Excel is ubiquitous). Solver is developed by a company named Frontline (http://www.solver.com/), not by Microsoft.

 

Frontline.JPG

 

I am a great fan of Excel, but I hate Solver. Mainly because Frontline solver has a limitation of 200 variables.

Try to solve more than 200 variables, and the following message will appear:

 

SolverMessage.JPG
Unfortunately my Excel is in portuguese, but the message is clear

It is very very frustrating to discover this limitation after you spent some hours modelling the problem.

Frontline provides more powerful versions of solver, but it is not free. If you are a company, that’s ok, just buy it. But if you are a student, it is not a trivial value. Luckly, there are alternatives.

 


 

OpenSolver

The OpenSolver tool can help a lot in these cases. It is a Great (upper case Great) initiative by Andrew Mason, University of Auckland, New Zealand. It has a clean, easy interface, and calls a free source solver like CBC (another Great initiative by Coin-OR – http://www.coin-or.org/).

Opensolver.JPG

 

Because the solver is CBC, we don’t have limitation on number of variables, constraints or whatever.

 

And because OpenSolver is open source, it is easy to learn a lot from the VBA code and also help the development of it. The limit is our imagination.


 

Next posts will cover these topics.

  • OpenSolver Example 1
  • OpenSolver Example 2
  • OpenSolver Example 3
  • OpenSolver Example with VBA
  • Conclusion: Spreadsheet versus Professional tools

 

Enjoy!

Arnaldo Gunzi.

 

Knapsack Problem – Dynamic Programming

Dynamic Programming

Dynamic Programming is one very useful technique for solving problems like the Knapsack problem.

The basic idea of Dynamic programming is to break down a problem in recursive subproblems. Each subproblem must be easy to solve, and its solution stored in some data structure to be used again.

In a schematic drawing, it’s something like this:

 

DynamicScheme.jpg

Dynamic programming is not a general procedure to be applied in any problem. The problem has to allow it. Richard Bellman, the creator of Dynamic Programming, called it  “Principle of optimallity”. It is like saying “this method works in the problems it works, and doesn’t works in the problems it doesn’t works”. But mathematicians love this kind of concept, because they can ask: which problems has the principle of optimality? Is the Knapsack one of them?

Since this method is not general, the programmer has to design a new method for every different problem and also figure out if the problem has the principle of optimality.

But, despite all these observations, it is a useful way of thinking.


 

The Algorithm

The detailed algorithm can be found in Wikipedia  (https://en.wikipedia.org/wiki/Dynamic_programming). I’ll sketch here the main ideias.

Suppose the problem is the same of previous post.
Max Weight = 14.
List of itens:
ValueWeight

1) It is easy to compute the solution with 1 item for a range of weights.

Array of values
Dyn01.JPG
Item 1, with weight 8 and value 10,000. If we vary the max weight of the knapsack from 0 to MaxW, the value array will be Zero before we have max weight = 8, and 10,000 after it.

 Array of picked itens:

Dyn02.JPG
“i1” refers to item 1

 


2) We increase the number of itens, and compare current solution versus solution with new item.
Step2_details
With items 1 and 2: if the max weight is 6, we pick item 2. When max weight is 8, we compare solution with weight 8 from item 1 (10,000) against solution with item 1 and weight 2 plus item 2 (0) and weight 6 (3,000). It is better to keep the 10,000 value solution.
Step2_details2.JPG
When maxWeight is 14, we compare current solution with item 1 (10,000) versus solution with item 1 and weight 8 (10,000) plus item 2: 13,000
Complete array with 2 items.
Step2.JPG
Evaluation arrays with itens 1 and 2
Proceeding this way, we get the full evaluation matrix.
Step5.JPG
Evaluation array with 5 items.
The optimum is to pick itens 1, 3 and 4, and the value of those items is 14,000.
Note that, in order to get the optimum, I had to calculate every solution for 1 to number of itens and 1 to number weigths. Thus, the computational effort to calculate it is proportional to N*maxW.
The problem happens when N and W are large. It needs a lot of computation processing.
Knapsack is a pseudo polynomial time algorithm. Polynomial because it is proportional to N. But pseudo since it is not quite true it is proportional to N – it is also proportional to W, and W can be very very large. One day I’ll post a more detailed way of thinking of this.

Divide and conquer
Other question I had when studying dynamic programming for the first time was: what is the difference of this method compared to divide and conquer methods?
I think this discussion deserves a post apart, but just to give a simple explanation, divide and conquer has a “horizontal” structure, with subproblems independent of each other, while dynamic programming has a “nested” structure, with subproblems dependent of other subproblems.

Computation implementation
An Excel- VBA implementation of dynamic programming applied to the knapsack problem can be found at https://github.com/asgunzi/KnapsackDynamicProgrammingVBA.
It uses the Random case generator of previous post and generates a solution for this case.
ScreenDynProg.JPG

Other Applications
Dynamic programming can be applied in several other cases. Cutting stock problems are also a classic application.
One less known application is in the optimization of forest multiproducts. We have a forest of eucalyptus. Each tree has to be cut in several pieces. Each piece has a different diameter (since the tree is larger in the base). A product is a combination of diameter and lenght. Given a demand, how can I minimize the waste and maximize the use of the forest?

 

In the next posts, I’ll show alternative methods for solving the Knapsack Problem.

 

Arnaldo Gunzi.

 

 

The knapsack problem

Imagine that you are going to a trip, and everything you have just a knapsack to carry on everything.
New-Unisex-black-backpack-students-backpack-small-travel-knapsack.jpg
The maximum weight you can carry is limited, say 14 kg.
There are a list of itens you’re considering to choose from. Each item has a weight and a value.
ValueWeight.PNG
Item, Value and Weight

 

Which of the itens do I choose, to maximize the value I can carry on my backpack?

 


The “Human method”

The first method described is the “human method”. How you and me would choose the itens.
I would choose the itens with greatest value for me, one by one, till the maximum weight.

In this example, I would choose itens 1 and 2, whose sum of values is 13.000 and sum of weights is 14. It is the “greedy” method, heuristically simple but can be suboptimal.

valueWeight2.png
First choice, the greedy method

 

In the end, I would take a look on my selection, and try to find if I can replace one of the chosen itens for two itens not choosed with greater value.
In this example, a simple inspection shows that the item 2 is too heavy for its benefits. It is better to replace it for itens 3 and 4.
valueWeight3.png
Picking after a more detailed inspection
The final selection contains itens {1, 3, 4}, with total value 14.000 (10.000 + 2000 + 2000) and weight 13 (8 +2+3).
The problem is when the list is very large, with tens or hundreds of itens. It is several timer harder to figure out a good solution.
The human method is very important. It is suboptimal, but it is fast. Humans are good problem solvers. We use intuition and millions of years of understanding of nature, to solve this kind of problem. Human reasoning can be a source for fast approximation algorithms.
Even when the problem is large, humans can solve in a heuristically interesting way. The world is also Pareto, what means that 20% of itens will have 80% of the value. In a intuitive way, I would focus my attention in those 20% and apply the above described method.
In real world application problems, I always respect a lot the people who make the analysis and try to someway incorporate some of theirs thinking in my models – a human centered analytics.

Mathematical formulation

The formulation of the knapsack problem via integer programming is straightforward.
If xi is the decision variable, binary (1 is to carry the item, 0 is to not carry) then
KnapsackFormulation.PNG
In the next posts I’ll show how to solve it with computational methods, as dynamic programming,  integer programming using solver and evolutive metaheuristics.
Arnaldo Gunzi
A big fan of real world heuristics

Human centered analytics

Engineers, computer scientists and technicians alike have affinity with mathematics, numbers, formulas. They are the kind of people who prefer to know How it works than What it delivers. They are the ones who disassemble an old watch and try to fix it.
5920883-disassembled-wrist-watch-lies-on-table-Stock-Photo.jpg
We try to understand why it works. Normal people just wear it.
One of my common mistakes in my first years in the corporate world was that I faced it like it was a college examination: I tried to explain to the executives the hypothesis, the developments, the formulas, and expected then to ask about some technical stuff, like why I used a Chi-Squared distribution instead of a Normal one, as if they were academic teachers.

Surely it was not the case. In the corporate world, and in the real world in general, people want answers. Just it, answers. And they want you to explain the solution in simple, easy-to-follow and reasonable steps. They are not stupid and of course they don’t want to be treated like that. They will check the big numbers and look for contradictions. And they want to know, in human words, why is your solution the correct one. You have to tell a history. It is your task to explain what you mean in simple ways, and to defend your position using the same domain of words (not formulas).

shakespeare-words.jpg
We need to tell histories with words, like Shakespeare. Not formulas.

 


The optimal solution for the wrong problem
In the same line of thinking, I spent a lot of years in my quest to always find the optimal solution with the best up-to-date method for analytical problems.
Wrong. Completely wrong.
An optimal solution for the wrong problem solves nothing.
FindX.png
People in the real world don’t need the optimal solution. They need a solution that works effectively. And the only way to find an effective solution is in the Questions, not in the method of resolution of the model. The problem is in the formulation of the problem.
If you want to solve a real world problem, it will not come with a formulation “Use integer programming to solve this set of equations”. It will come in diffuse information pieces, with dozens of people telling different points of will, with spurious sparse databases. And you will have to formulate the questions.
The model can be as simple as a single linear equation, as long as it provides the effective solution.

Challenges of Analytics
I see a lot of excitement and buzz words: Big Data, Artificial intelligence, Watson, R-Phyton or whatever.
bigdata.jpg

These are only tools. They don’t solve problems alone. They will not solve all the problems of the world.

In a couple of years, these buzz words will fade, some will be forgotten, and new buzz names will appear, promising the same things.

 

For me, better than trying to provide the magical solution, is to make the basics. Simple and effective solutions. And these solutions must:

  • Provide information, guidance, clarity
  • Support decision
  • Empower analysis
That is, provide ways to make questions.
Arnaldo Gunzi
Feb 2016
“The release of atom power has changed everything except our way of thinking. The solution to this problem lies in the heart of mankind. If only I had known, I would have become a watchmaker”
Albert Einstein