0:00:00.033,0:00:07.332
So cleaning the input, which is more technically also known as pre-processing--what's the idea behind that?
0:00:07.333,0:00:12.066
Here's what you would normally do for an NP-complete problem as we have talked about so far:
0:00:12.067,0:00:18.966
So if you're given the input for an NP-complete problem. What you would do using the techniques from the previous units
0:00:18.967,0:00:22.966
is you would fire up your search tree to try and find an optimal solution.
0:00:22.967,0:00:27.899
And of course, that search tree has exponential size. So the algorithm goes through that tree here
0:00:27.900,0:00:34.566
until, at a certain point in time, it says, "Bingo, I found a solution," or, "I found the best possible solution."
0:00:34.567,0:00:41.366
The idea of pre-processing now is similar to something that we already saw for vertex cover or independent set,
0:00:41.367,0:00:47.032
where, for certain vertices, while we were traversing the search tree, or even in advance,
0:00:47.033,0:00:54.499
what we could already say for certain vertices, we know what assignment that vertex is going to have in an optimum solution.
0:00:54.500,0:00:59.932
And we could make that statement without actually going through any branching, but in polynomial time.
0:00:59.933,0:01:06.832
And that is the idea of pre-processing. The idea of pre-processing is, if you can actually find certain parts of the input,
0:01:06.833,0:01:15.666
where in polynomial time, of course, you can already say how they would be handled in an optimum solution.
0:01:15.667,0:01:21.866
So we're kind of nibbling away at the input here. And what that, of course, means is if your pre-processing is successful,
0:01:21.867,0:01:29.499
or especially if it's very successful, then the search tree that results from that input is not going to be as big.
0:01:29.500,0:01:34.432
So there's certain parts of the search tree that you don't have to do, because you already have found out
0:01:34.433,0:01:38.466
in the pre-processing what that part of the solution is going to look like.
0:01:38.467,0:01:44.532
So the search tree size will decrease. So we can, for example, cut off this branch here, because we've already
0:01:44.533,0:01:49.666
pre-processed this, and we can cut off this one here because we also pre-processed that one.
0:01:49.667,0:01:52.666
So now let's make this more concrete, and let me give you a concrete example.
0:01:52.667,0:01:58.666
And we're going to do this for SAT, because SAT is a problem where pre-processing is usually very successful.
0:01:58.667,0:02:05.299
So if you were, for example, to use a commercial SAT solver, then pre-processing will play a very very important role in that.
0:02:05.300,0:02:11.866
I once talked to somebody who develops those solvers, and they basically said that his package works 90-95%
0:02:11.867,0:02:19.166
through pre-processing. So even for SAT instances with thousands of variables, his package can basically solve it,
0:02:19.167,0:02:23.566
but it can only solve it because the pre-processing algorithms are very good.
0:02:23.567,0:02:31.266
So you'll remember that SAT was the problem of finding if a given Boolean formula has a satisfying assignment or not.
0:02:31.267,0:02:35.766
And I'm now going to write down a Boolean formula for you, and then we're going to do a little quiz
0:02:35.767,0:02:38.766
to make pre-processing more concrete.
0:02:38.767,0:02:48.666
So the SAT formula is x1 or x3 or x5, and not x1 or x2 or x4, and so on, and so on.
0:02:48.667,0:02:53.066
Now, of course this formula here doesn't have very many variables. It's just six variables--
0:02:53.067,0:03:01.432
x1, x2, x3, x4, x5 and x6. So with a little playing around, you would probably be able to figure out if this Boolean formula here
0:03:01.433,0:03:06.466
has a satisfying assignment or not. But of course, what we want to do now is pre-processing,
0:03:06.467,0:03:12.566
And that means that we want to see if, for certain variables, in this Boolean formula, we can figure out
0:03:12.567,0:03:18.132
if they should be set to true or false, without actually trying all possible combinations.
0:03:18.133,0:03:23.732
And as I said, we're going to do this as a quiz. So what I would like you to do is to look at this Boolean formula here,
0:03:23.733,0:03:33.532
and then consider the variables x1, x2, x3 and x4, and for each of those variables, determine if it's easy to see,
0:03:33.533,0:03:38.332
if they should be set to true or false. And by easy, I mean without actually trying around different true assignments
0:03:38.333,0:03:44.699
for the other variables, but you can basically immediately say, for these variables, if they should be set to true or false.
0:03:44.700,0:03:50.832
I'm going to give you one hint for the solution, and that is that, in my opinion--and this is a bit of a subjective question--
0:03:50.833,0:03:59.967
I think that for two of these variables here, it's rather easy to see. And I would like you to select those two.