Longest common substring in linear time

We know that the longest common substring of two strings can be found in $mathcal O(N^2)$ time complexity.
Can a solution be found in only linear time?

edited yesterday

Glorfindel

2341311

asked 2 days ago

Manoharsinh Rana

1277

add a comment |

We know that the longest common substring of two strings can be found in $mathcal O(N^2)$ time complexity.
Can a solution be found in only linear time?

edited yesterday

Glorfindel

2341311

asked 2 days ago

Manoharsinh Rana

1277

add a comment |

We know that the longest common substring of two strings can be found in $mathcal O(N^2)$ time complexity.
Can a solution be found in only linear time?

edited yesterday

Glorfindel

2341311

asked 2 days ago

Manoharsinh Rana

1277

We know that the longest common substring of two strings can be found in $mathcal O(N^2)$ time complexity.
Can a solution be found in only linear time?

algorithms time-complexity strings longest-common-substring

edited yesterday

Glorfindel

2341311

asked 2 days ago

Manoharsinh Rana

1277

edited yesterday

Glorfindel

2341311

asked 2 days ago

Manoharsinh Rana

1277

edited yesterday

Glorfindel

2341311

edited yesterday

Glorfindel

2341311

edited yesterday

Glorfindel

2341311

asked 2 days ago

Manoharsinh Rana

1277

asked 2 days ago

Manoharsinh Rana

1277

asked 2 days ago

Manoharsinh Rana

1277

add a comment |

2 Answers
2

active

oldest

votes

Let $m$ and $n$ be the lengths of two given strings,

Linear time assuming the size of the alphabet is constant.

Yes, the longest common substring of two given strings can be found in $O(m+n)$ time, assuming the size of the alphabet is constant.

Here is an excerpt from Wikipedia article on longest common substring problem.

The longest common substrings of a set of strings can be found by building a generalized suffix tree for the strings, and then finding the deepest internal nodes which have leaf nodes from all the strings in the subtree below it.

Building a generalized suffix tree for two given strings takes $O(m+n)$ time using the famous ingenious Ukkonen's algorithm. Finding the deepest internal nodes that come from both strings takes $O(m+n)$ time. Hence we can find the longest common substring in $O(m+n)$ time.

For a working implementation, please take a look at Suffix Tree Application 5 – Longest Common Substring at GeeksforGeeks

(Improved!) Linear time

In fact, the longest common substring of two given strings can be found in $O(m+n)$ time regardless of the size of the alphabet.

Here is the abstract of Computing Longest Common Substrings Via Suffix Arrays by Babenko, Maxim & Starikovskaya, Tatiana. (2008).

Given a set of $N$ strings $A = {alpha_1,cdots,alpha_N}$ of total length $n$ over alphabet $Sigma$ one may ask to find, for each $2 le kle N$, the longest substring $beta$ that appears in at least $K$ strings in $A$. It is known that this problem can be solved in $O(n)$ time with the help of suffix trees. However, the resulting algorithm is rather complicated (in particular, it involves answering certain least common ancestor queries in $O(1)$ time). Also, its running time and memory consumption may depend on $|Sigma|$.

This paper presents an alternative, remarkably simple approach to
the above problem, which relies on the notion of suffix arrays. Once
the suffix array of some auxiliary $O(n)$-length string is computed, one
needs a simple $O(n)$-time postprocessing to find the requested longest
substring. Since a number of efficient and simple linear-time algorithms
for constructing suffix arrays has been recently developed (with constant
not depending on $|Sigma|$), our approach seems to be quite practical.

Here is the general idea of the algorithm in the paper above. Let string $alpha$ be concatenation of all $alpha_i$ with separating sentinels. Construct the suffix array for $α$ as well as its longest-common-prefix array. Apply a sliding window technique to these arrays to obtain the longest common substrings.

edited yesterday

answered 2 days ago

Apass.Jack

13.5k1940

add a comment |

Yes. There's even a Wikipedia article about it! https://en.wikipedia.org/wiki/Longest_common_substring_problem

In particular, as Wikipedia explains, there is a linear-time algorithm, using suffix trees (or suffix arrays).

Searching on "longest common substring" turns up that Wikipedia article as the first hit (for me). In the future, please research the problem before asking here. (See, e.g., https://meta.stackoverflow.com/q/261592/781723.)

answered 2 days ago

D.W.♦

102k12127292

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "419"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f105969%2flongest-common-substring-in-linear-time%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Let $m$ and $n$ be the lengths of two given strings,

Linear time assuming the size of the alphabet is constant.

Yes, the longest common substring of two given strings can be found in $O(m+n)$ time, assuming the size of the alphabet is constant.

Here is an excerpt from Wikipedia article on longest common substring problem.

The longest common substrings of a set of strings can be found by building a generalized suffix tree for the strings, and then finding the deepest internal nodes which have leaf nodes from all the strings in the subtree below it.

For a working implementation, please take a look at Suffix Tree Application 5 – Longest Common Substring at GeeksforGeeks

(Improved!) Linear time

In fact, the longest common substring of two given strings can be found in $O(m+n)$ time regardless of the size of the alphabet.

Here is the abstract of Computing Longest Common Substrings Via Suffix Arrays by Babenko, Maxim & Starikovskaya, Tatiana. (2008).

Given a set of $N$ strings $A = {alpha_1,cdots,alpha_N}$ of total length $n$ over alphabet $Sigma$ one may ask to find, for each $2 le kle N$, the longest substring $beta$ that appears in at least $K$ strings in $A$. It is known that this problem can be solved in $O(n)$ time with the help of suffix trees. However, the resulting algorithm is rather complicated (in particular, it involves answering certain least common ancestor queries in $O(1)$ time). Also, its running time and memory consumption may depend on $|Sigma|$.

This paper presents an alternative, remarkably simple approach to
the above problem, which relies on the notion of suffix arrays. Once
the suffix array of some auxiliary $O(n)$-length string is computed, one
needs a simple $O(n)$-time postprocessing to find the requested longest
substring. Since a number of efficient and simple linear-time algorithms
for constructing suffix arrays has been recently developed (with constant
not depending on $|Sigma|$), our approach seems to be quite practical.

edited yesterday

answered 2 days ago

Apass.Jack

13.5k1940

add a comment |

Let $m$ and $n$ be the lengths of two given strings,

Linear time assuming the size of the alphabet is constant.

Yes, the longest common substring of two given strings can be found in $O(m+n)$ time, assuming the size of the alphabet is constant.

Here is an excerpt from Wikipedia article on longest common substring problem.

The longest common substrings of a set of strings can be found by building a generalized suffix tree for the strings, and then finding the deepest internal nodes which have leaf nodes from all the strings in the subtree below it.

For a working implementation, please take a look at Suffix Tree Application 5 – Longest Common Substring at GeeksforGeeks

(Improved!) Linear time

In fact, the longest common substring of two given strings can be found in $O(m+n)$ time regardless of the size of the alphabet.

Here is the abstract of Computing Longest Common Substrings Via Suffix Arrays by Babenko, Maxim & Starikovskaya, Tatiana. (2008).

Given a set of $N$ strings $A = {alpha_1,cdots,alpha_N}$ of total length $n$ over alphabet $Sigma$ one may ask to find, for each $2 le kle N$, the longest substring $beta$ that appears in at least $K$ strings in $A$. It is known that this problem can be solved in $O(n)$ time with the help of suffix trees. However, the resulting algorithm is rather complicated (in particular, it involves answering certain least common ancestor queries in $O(1)$ time). Also, its running time and memory consumption may depend on $|Sigma|$.

This paper presents an alternative, remarkably simple approach to
the above problem, which relies on the notion of suffix arrays. Once
the suffix array of some auxiliary $O(n)$-length string is computed, one
needs a simple $O(n)$-time postprocessing to find the requested longest
substring. Since a number of efficient and simple linear-time algorithms
for constructing suffix arrays has been recently developed (with constant
not depending on $|Sigma|$), our approach seems to be quite practical.

edited yesterday

answered 2 days ago

Apass.Jack

13.5k1940

add a comment |

Let $m$ and $n$ be the lengths of two given strings,

Linear time assuming the size of the alphabet is constant.

Yes, the longest common substring of two given strings can be found in $O(m+n)$ time, assuming the size of the alphabet is constant.

Here is an excerpt from Wikipedia article on longest common substring problem.

The longest common substrings of a set of strings can be found by building a generalized suffix tree for the strings, and then finding the deepest internal nodes which have leaf nodes from all the strings in the subtree below it.

For a working implementation, please take a look at Suffix Tree Application 5 – Longest Common Substring at GeeksforGeeks

(Improved!) Linear time

In fact, the longest common substring of two given strings can be found in $O(m+n)$ time regardless of the size of the alphabet.

Here is the abstract of Computing Longest Common Substrings Via Suffix Arrays by Babenko, Maxim & Starikovskaya, Tatiana. (2008).

Given a set of $N$ strings $A = {alpha_1,cdots,alpha_N}$ of total length $n$ over alphabet $Sigma$ one may ask to find, for each $2 le kle N$, the longest substring $beta$ that appears in at least $K$ strings in $A$. It is known that this problem can be solved in $O(n)$ time with the help of suffix trees. However, the resulting algorithm is rather complicated (in particular, it involves answering certain least common ancestor queries in $O(1)$ time). Also, its running time and memory consumption may depend on $|Sigma|$.

This paper presents an alternative, remarkably simple approach to
the above problem, which relies on the notion of suffix arrays. Once
the suffix array of some auxiliary $O(n)$-length string is computed, one
needs a simple $O(n)$-time postprocessing to find the requested longest
substring. Since a number of efficient and simple linear-time algorithms
for constructing suffix arrays has been recently developed (with constant
not depending on $|Sigma|$), our approach seems to be quite practical.

edited yesterday

answered 2 days ago

Apass.Jack

13.5k1940

Let $m$ and $n$ be the lengths of two given strings,

Linear time assuming the size of the alphabet is constant.

Yes, the longest common substring of two given strings can be found in $O(m+n)$ time, assuming the size of the alphabet is constant.

Here is an excerpt from Wikipedia article on longest common substring problem.

The longest common substrings of a set of strings can be found by building a generalized suffix tree for the strings, and then finding the deepest internal nodes which have leaf nodes from all the strings in the subtree below it.

For a working implementation, please take a look at Suffix Tree Application 5 – Longest Common Substring at GeeksforGeeks

(Improved!) Linear time

In fact, the longest common substring of two given strings can be found in $O(m+n)$ time regardless of the size of the alphabet.

Here is the abstract of Computing Longest Common Substrings Via Suffix Arrays by Babenko, Maxim & Starikovskaya, Tatiana. (2008).

Given a set of $N$ strings $A = {alpha_1,cdots,alpha_N}$ of total length $n$ over alphabet $Sigma$ one may ask to find, for each $2 le kle N$, the longest substring $beta$ that appears in at least $K$ strings in $A$. It is known that this problem can be solved in $O(n)$ time with the help of suffix trees. However, the resulting algorithm is rather complicated (in particular, it involves answering certain least common ancestor queries in $O(1)$ time). Also, its running time and memory consumption may depend on $|Sigma|$.

This paper presents an alternative, remarkably simple approach to
the above problem, which relies on the notion of suffix arrays. Once
the suffix array of some auxiliary $O(n)$-length string is computed, one
needs a simple $O(n)$-time postprocessing to find the requested longest
substring. Since a number of efficient and simple linear-time algorithms
for constructing suffix arrays has been recently developed (with constant
not depending on $|Sigma|$), our approach seems to be quite practical.

edited yesterday

answered 2 days ago

Apass.Jack

13.5k1940

edited yesterday

answered 2 days ago

Apass.Jack

13.5k1940

answered 2 days ago

Apass.Jack

13.5k1940

answered 2 days ago

Apass.Jack

13.5k1940

add a comment |

Yes. There's even a Wikipedia article about it! https://en.wikipedia.org/wiki/Longest_common_substring_problem

In particular, as Wikipedia explains, there is a linear-time algorithm, using suffix trees (or suffix arrays).

answered 2 days ago

D.W.♦

102k12127292

add a comment |

Yes. There's even a Wikipedia article about it! https://en.wikipedia.org/wiki/Longest_common_substring_problem

In particular, as Wikipedia explains, there is a linear-time algorithm, using suffix trees (or suffix arrays).

answered 2 days ago

D.W.♦

102k12127292

add a comment |

Yes. There's even a Wikipedia article about it! https://en.wikipedia.org/wiki/Longest_common_substring_problem

In particular, as Wikipedia explains, there is a linear-time algorithm, using suffix trees (or suffix arrays).

answered 2 days ago

D.W.♦

102k12127292

Yes. There's even a Wikipedia article about it! https://en.wikipedia.org/wiki/Longest_common_substring_problem

In particular, as Wikipedia explains, there is a linear-time algorithm, using suffix trees (or suffix arrays).

answered 2 days ago

D.W.♦

102k12127292

answered 2 days ago

D.W.♦

102k12127292

answered 2 days ago

D.W.♦

102k12127292

answered 2 days ago

D.W.♦

102k12127292

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Computer Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ggtkuk