Longest common substring in linear time












7












$begingroup$


We know that the longest common substring of two strings can be found in $mathcal O(N^2)$ time complexity.
Can a solution be found in only linear time?










share|cite|improve this question











$endgroup$

















    7












    $begingroup$


    We know that the longest common substring of two strings can be found in $mathcal O(N^2)$ time complexity.
    Can a solution be found in only linear time?










    share|cite|improve this question











    $endgroup$















      7












      7








      7


      4



      $begingroup$


      We know that the longest common substring of two strings can be found in $mathcal O(N^2)$ time complexity.
      Can a solution be found in only linear time?










      share|cite|improve this question











      $endgroup$




      We know that the longest common substring of two strings can be found in $mathcal O(N^2)$ time complexity.
      Can a solution be found in only linear time?







      algorithms time-complexity strings longest-common-substring






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited yesterday









      Glorfindel

      2341311




      2341311










      asked 2 days ago









      Manoharsinh RanaManoharsinh Rana

      1277




      1277






















          2 Answers
          2






          active

          oldest

          votes


















          13












          $begingroup$

          Let $m$ and $n$ be the lengths of two given strings,



          Linear time assuming the size of the alphabet is constant.



          Yes, the longest common substring of two given strings can be found in $O(m+n)$ time, assuming the size of the alphabet is constant.



          Here is an excerpt from Wikipedia article on longest common substring problem.




          The longest common substrings of a set of strings can be found by building a generalized suffix tree for the strings, and then finding the deepest internal nodes which have leaf nodes from all the strings in the subtree below it.




          Building a generalized suffix tree for two given strings takes $O(m+n)$ time using the famous ingenious Ukkonen's algorithm. Finding the deepest internal nodes that come from both strings takes $O(m+n)$ time. Hence we can find the longest common substring in $O(m+n)$ time.



          For a working implementation, please take a look at Suffix Tree Application 5 – Longest Common Substring at GeeksforGeeks



          (Improved!) Linear time



          In fact, the longest common substring of two given strings can be found in $O(m+n)$ time regardless of the size of the alphabet.



          Here is the abstract of Computing Longest Common Substrings Via Suffix Arrays by Babenko, Maxim & Starikovskaya, Tatiana. (2008).




          Given a set of $N$ strings $A = {alpha_1,cdots,alpha_N}$ of total length $n$ over alphabet $Sigma$ one may ask to find, for each $2 le kle N$, the longest substring $beta$ that appears in at least $K$ strings in $A$. It is known that this problem can be solved in $O(n)$ time with the help of suffix trees. However, the resulting algorithm is rather complicated (in particular, it involves answering certain least common ancestor queries in $O(1)$ time). Also, its running time and memory consumption may depend on $|Sigma|$.



          This paper presents an alternative, remarkably simple approach to
          the above problem, which relies on the notion of suffix arrays. Once
          the suffix array of some auxiliary $O(n)$-length string is computed, one
          needs a simple $O(n)$-time postprocessing to find the requested longest
          substring. Since a number of efficient and simple linear-time algorithms
          for constructing suffix arrays has been recently developed (with constant
          not depending on $|Sigma|$), our approach seems to be quite practical.




          Here is the general idea of the algorithm in the paper above. Let string $alpha$ be concatenation of all $alpha_i$ with separating sentinels. Construct the suffix array for $α$ as well as its longest-common-prefix array. Apply a sliding window technique to these arrays to obtain the longest common substrings.














          share|cite|improve this answer











          $endgroup$





















            4












            $begingroup$

            Yes. There's even a Wikipedia article about it! https://en.wikipedia.org/wiki/Longest_common_substring_problem



            In particular, as Wikipedia explains, there is a linear-time algorithm, using suffix trees (or suffix arrays).



            Searching on "longest common substring" turns up that Wikipedia article as the first hit (for me). In the future, please research the problem before asking here. (See, e.g., https://meta.stackoverflow.com/q/261592/781723.)






            share|cite|improve this answer









            $endgroup$













              Your Answer





              StackExchange.ifUsing("editor", function () {
              return StackExchange.using("mathjaxEditing", function () {
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
              });
              });
              }, "mathjax-editing");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "419"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f105969%2flongest-common-substring-in-linear-time%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              13












              $begingroup$

              Let $m$ and $n$ be the lengths of two given strings,



              Linear time assuming the size of the alphabet is constant.



              Yes, the longest common substring of two given strings can be found in $O(m+n)$ time, assuming the size of the alphabet is constant.



              Here is an excerpt from Wikipedia article on longest common substring problem.




              The longest common substrings of a set of strings can be found by building a generalized suffix tree for the strings, and then finding the deepest internal nodes which have leaf nodes from all the strings in the subtree below it.




              Building a generalized suffix tree for two given strings takes $O(m+n)$ time using the famous ingenious Ukkonen's algorithm. Finding the deepest internal nodes that come from both strings takes $O(m+n)$ time. Hence we can find the longest common substring in $O(m+n)$ time.



              For a working implementation, please take a look at Suffix Tree Application 5 – Longest Common Substring at GeeksforGeeks



              (Improved!) Linear time



              In fact, the longest common substring of two given strings can be found in $O(m+n)$ time regardless of the size of the alphabet.



              Here is the abstract of Computing Longest Common Substrings Via Suffix Arrays by Babenko, Maxim & Starikovskaya, Tatiana. (2008).




              Given a set of $N$ strings $A = {alpha_1,cdots,alpha_N}$ of total length $n$ over alphabet $Sigma$ one may ask to find, for each $2 le kle N$, the longest substring $beta$ that appears in at least $K$ strings in $A$. It is known that this problem can be solved in $O(n)$ time with the help of suffix trees. However, the resulting algorithm is rather complicated (in particular, it involves answering certain least common ancestor queries in $O(1)$ time). Also, its running time and memory consumption may depend on $|Sigma|$.



              This paper presents an alternative, remarkably simple approach to
              the above problem, which relies on the notion of suffix arrays. Once
              the suffix array of some auxiliary $O(n)$-length string is computed, one
              needs a simple $O(n)$-time postprocessing to find the requested longest
              substring. Since a number of efficient and simple linear-time algorithms
              for constructing suffix arrays has been recently developed (with constant
              not depending on $|Sigma|$), our approach seems to be quite practical.




              Here is the general idea of the algorithm in the paper above. Let string $alpha$ be concatenation of all $alpha_i$ with separating sentinels. Construct the suffix array for $α$ as well as its longest-common-prefix array. Apply a sliding window technique to these arrays to obtain the longest common substrings.














              share|cite|improve this answer











              $endgroup$


















                13












                $begingroup$

                Let $m$ and $n$ be the lengths of two given strings,



                Linear time assuming the size of the alphabet is constant.



                Yes, the longest common substring of two given strings can be found in $O(m+n)$ time, assuming the size of the alphabet is constant.



                Here is an excerpt from Wikipedia article on longest common substring problem.




                The longest common substrings of a set of strings can be found by building a generalized suffix tree for the strings, and then finding the deepest internal nodes which have leaf nodes from all the strings in the subtree below it.




                Building a generalized suffix tree for two given strings takes $O(m+n)$ time using the famous ingenious Ukkonen's algorithm. Finding the deepest internal nodes that come from both strings takes $O(m+n)$ time. Hence we can find the longest common substring in $O(m+n)$ time.



                For a working implementation, please take a look at Suffix Tree Application 5 – Longest Common Substring at GeeksforGeeks



                (Improved!) Linear time



                In fact, the longest common substring of two given strings can be found in $O(m+n)$ time regardless of the size of the alphabet.



                Here is the abstract of Computing Longest Common Substrings Via Suffix Arrays by Babenko, Maxim & Starikovskaya, Tatiana. (2008).




                Given a set of $N$ strings $A = {alpha_1,cdots,alpha_N}$ of total length $n$ over alphabet $Sigma$ one may ask to find, for each $2 le kle N$, the longest substring $beta$ that appears in at least $K$ strings in $A$. It is known that this problem can be solved in $O(n)$ time with the help of suffix trees. However, the resulting algorithm is rather complicated (in particular, it involves answering certain least common ancestor queries in $O(1)$ time). Also, its running time and memory consumption may depend on $|Sigma|$.



                This paper presents an alternative, remarkably simple approach to
                the above problem, which relies on the notion of suffix arrays. Once
                the suffix array of some auxiliary $O(n)$-length string is computed, one
                needs a simple $O(n)$-time postprocessing to find the requested longest
                substring. Since a number of efficient and simple linear-time algorithms
                for constructing suffix arrays has been recently developed (with constant
                not depending on $|Sigma|$), our approach seems to be quite practical.




                Here is the general idea of the algorithm in the paper above. Let string $alpha$ be concatenation of all $alpha_i$ with separating sentinels. Construct the suffix array for $α$ as well as its longest-common-prefix array. Apply a sliding window technique to these arrays to obtain the longest common substrings.














                share|cite|improve this answer











                $endgroup$
















                  13












                  13








                  13





                  $begingroup$

                  Let $m$ and $n$ be the lengths of two given strings,



                  Linear time assuming the size of the alphabet is constant.



                  Yes, the longest common substring of two given strings can be found in $O(m+n)$ time, assuming the size of the alphabet is constant.



                  Here is an excerpt from Wikipedia article on longest common substring problem.




                  The longest common substrings of a set of strings can be found by building a generalized suffix tree for the strings, and then finding the deepest internal nodes which have leaf nodes from all the strings in the subtree below it.




                  Building a generalized suffix tree for two given strings takes $O(m+n)$ time using the famous ingenious Ukkonen's algorithm. Finding the deepest internal nodes that come from both strings takes $O(m+n)$ time. Hence we can find the longest common substring in $O(m+n)$ time.



                  For a working implementation, please take a look at Suffix Tree Application 5 – Longest Common Substring at GeeksforGeeks



                  (Improved!) Linear time



                  In fact, the longest common substring of two given strings can be found in $O(m+n)$ time regardless of the size of the alphabet.



                  Here is the abstract of Computing Longest Common Substrings Via Suffix Arrays by Babenko, Maxim & Starikovskaya, Tatiana. (2008).




                  Given a set of $N$ strings $A = {alpha_1,cdots,alpha_N}$ of total length $n$ over alphabet $Sigma$ one may ask to find, for each $2 le kle N$, the longest substring $beta$ that appears in at least $K$ strings in $A$. It is known that this problem can be solved in $O(n)$ time with the help of suffix trees. However, the resulting algorithm is rather complicated (in particular, it involves answering certain least common ancestor queries in $O(1)$ time). Also, its running time and memory consumption may depend on $|Sigma|$.



                  This paper presents an alternative, remarkably simple approach to
                  the above problem, which relies on the notion of suffix arrays. Once
                  the suffix array of some auxiliary $O(n)$-length string is computed, one
                  needs a simple $O(n)$-time postprocessing to find the requested longest
                  substring. Since a number of efficient and simple linear-time algorithms
                  for constructing suffix arrays has been recently developed (with constant
                  not depending on $|Sigma|$), our approach seems to be quite practical.




                  Here is the general idea of the algorithm in the paper above. Let string $alpha$ be concatenation of all $alpha_i$ with separating sentinels. Construct the suffix array for $α$ as well as its longest-common-prefix array. Apply a sliding window technique to these arrays to obtain the longest common substrings.














                  share|cite|improve this answer











                  $endgroup$



                  Let $m$ and $n$ be the lengths of two given strings,



                  Linear time assuming the size of the alphabet is constant.



                  Yes, the longest common substring of two given strings can be found in $O(m+n)$ time, assuming the size of the alphabet is constant.



                  Here is an excerpt from Wikipedia article on longest common substring problem.




                  The longest common substrings of a set of strings can be found by building a generalized suffix tree for the strings, and then finding the deepest internal nodes which have leaf nodes from all the strings in the subtree below it.




                  Building a generalized suffix tree for two given strings takes $O(m+n)$ time using the famous ingenious Ukkonen's algorithm. Finding the deepest internal nodes that come from both strings takes $O(m+n)$ time. Hence we can find the longest common substring in $O(m+n)$ time.



                  For a working implementation, please take a look at Suffix Tree Application 5 – Longest Common Substring at GeeksforGeeks



                  (Improved!) Linear time



                  In fact, the longest common substring of two given strings can be found in $O(m+n)$ time regardless of the size of the alphabet.



                  Here is the abstract of Computing Longest Common Substrings Via Suffix Arrays by Babenko, Maxim & Starikovskaya, Tatiana. (2008).




                  Given a set of $N$ strings $A = {alpha_1,cdots,alpha_N}$ of total length $n$ over alphabet $Sigma$ one may ask to find, for each $2 le kle N$, the longest substring $beta$ that appears in at least $K$ strings in $A$. It is known that this problem can be solved in $O(n)$ time with the help of suffix trees. However, the resulting algorithm is rather complicated (in particular, it involves answering certain least common ancestor queries in $O(1)$ time). Also, its running time and memory consumption may depend on $|Sigma|$.



                  This paper presents an alternative, remarkably simple approach to
                  the above problem, which relies on the notion of suffix arrays. Once
                  the suffix array of some auxiliary $O(n)$-length string is computed, one
                  needs a simple $O(n)$-time postprocessing to find the requested longest
                  substring. Since a number of efficient and simple linear-time algorithms
                  for constructing suffix arrays has been recently developed (with constant
                  not depending on $|Sigma|$), our approach seems to be quite practical.




                  Here is the general idea of the algorithm in the paper above. Let string $alpha$ be concatenation of all $alpha_i$ with separating sentinels. Construct the suffix array for $α$ as well as its longest-common-prefix array. Apply a sliding window technique to these arrays to obtain the longest common substrings.















                  share|cite|improve this answer














                  share|cite|improve this answer



                  share|cite|improve this answer








                  edited yesterday

























                  answered 2 days ago









                  Apass.JackApass.Jack

                  13.5k1940




                  13.5k1940























                      4












                      $begingroup$

                      Yes. There's even a Wikipedia article about it! https://en.wikipedia.org/wiki/Longest_common_substring_problem



                      In particular, as Wikipedia explains, there is a linear-time algorithm, using suffix trees (or suffix arrays).



                      Searching on "longest common substring" turns up that Wikipedia article as the first hit (for me). In the future, please research the problem before asking here. (See, e.g., https://meta.stackoverflow.com/q/261592/781723.)






                      share|cite|improve this answer









                      $endgroup$


















                        4












                        $begingroup$

                        Yes. There's even a Wikipedia article about it! https://en.wikipedia.org/wiki/Longest_common_substring_problem



                        In particular, as Wikipedia explains, there is a linear-time algorithm, using suffix trees (or suffix arrays).



                        Searching on "longest common substring" turns up that Wikipedia article as the first hit (for me). In the future, please research the problem before asking here. (See, e.g., https://meta.stackoverflow.com/q/261592/781723.)






                        share|cite|improve this answer









                        $endgroup$
















                          4












                          4








                          4





                          $begingroup$

                          Yes. There's even a Wikipedia article about it! https://en.wikipedia.org/wiki/Longest_common_substring_problem



                          In particular, as Wikipedia explains, there is a linear-time algorithm, using suffix trees (or suffix arrays).



                          Searching on "longest common substring" turns up that Wikipedia article as the first hit (for me). In the future, please research the problem before asking here. (See, e.g., https://meta.stackoverflow.com/q/261592/781723.)






                          share|cite|improve this answer









                          $endgroup$



                          Yes. There's even a Wikipedia article about it! https://en.wikipedia.org/wiki/Longest_common_substring_problem



                          In particular, as Wikipedia explains, there is a linear-time algorithm, using suffix trees (or suffix arrays).



                          Searching on "longest common substring" turns up that Wikipedia article as the first hit (for me). In the future, please research the problem before asking here. (See, e.g., https://meta.stackoverflow.com/q/261592/781723.)







                          share|cite|improve this answer












                          share|cite|improve this answer



                          share|cite|improve this answer










                          answered 2 days ago









                          D.W.D.W.

                          102k12127292




                          102k12127292






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Computer Science Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f105969%2flongest-common-substring-in-linear-time%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Masuk log Menu navigasi

                              Identifying “long and narrow” polygons in with PostGISlength and width of polygonWhy postgis st_overlaps reports Qgis' “avoid intersections” generated polygon as overlapping with others?Adjusting polygons to boundary and filling holesDrawing polygons with fixed area?How to remove spikes in Polygons with PostGISDeleting sliver polygons after difference operation in QGIS?Snapping boundaries in PostGISSplit polygon into parts adding attributes based on underlying polygon in QGISSplitting overlap between polygons and assign to nearest polygon using PostGIS?Expanding polygons and clipping at midpoint?Removing Intersection of Buffers in Same Layers

                              Старые Смолеговицы Содержание История | География | Демография | Достопримечательности | Примечания | НавигацияHGЯOLHGЯOL41 206 832 01641 606 406 141Административно-территориальное деление Ленинградской области«Переписная оброчная книга Водской пятины 1500 года», С. 793«Карта Ингерманландии: Ивангорода, Яма, Копорья, Нотеборга», по материалам 1676 г.«Генеральная карта провинции Ингерманландии» Э. Белинга и А. Андерсина, 1704 г., составлена по материалам 1678 г.«Географический чертёж над Ижорскою землей со своими городами» Адриана Шонбека 1705 г.Новая и достоверная всей Ингерманландии ланткарта. Грав. А. Ростовцев. СПб., 1727 г.Топографическая карта Санкт-Петербургской губернии. 5-и верстка. Шуберт. 1834 г.Описание Санкт-Петербургской губернии по уездам и станамСпецкарта западной части России Ф. Ф. Шуберта. 1844 г.Алфавитный список селений по уездам и станам С.-Петербургской губернииСписки населённых мест Российской Империи, составленные и издаваемые центральным статистическим комитетом министерства внутренних дел. XXXVII. Санкт-Петербургская губерния. По состоянию на 1862 год. СПб. 1864. С. 203Материалы по статистике народного хозяйства в С.-Петербургской губернии. Вып. IX. Частновладельческое хозяйство в Ямбургском уезде. СПб, 1888, С. 146, С. 2, 7, 54Положение о гербе муниципального образования Курское сельское поселениеСправочник истории административно-территориального деления Ленинградской области.Топографическая карта Ленинградской области, квадрат О-35-23-В (Хотыницы), 1930 г.АрхивированоАдминистративно-территориальное деление Ленинградской области. — Л., 1933, С. 27, 198АрхивированоАдминистративно-экономический справочник по Ленинградской области. — Л., 1936, с. 219АрхивированоАдминистративно-территориальное деление Ленинградской области. — Л., 1966, с. 175АрхивированоАдминистративно-территориальное деление Ленинградской области. — Лениздат, 1973, С. 180АрхивированоАдминистративно-территориальное деление Ленинградской области. — Лениздат, 1990, ISBN 5-289-00612-5, С. 38АрхивированоАдминистративно-территориальное деление Ленинградской области. — СПб., 2007, с. 60АрхивированоКоряков Юрий База данных «Этно-языковой состав населённых пунктов России». Ленинградская область.Административно-территориальное деление Ленинградской области. — СПб, 1997, ISBN 5-86153-055-6, С. 41АрхивированоКультовый комплекс Старые Смолеговицы // Электронная энциклопедия ЭрмитажаПроблемы выявления, изучения и сохранения культовых комплексов с каменными крестами: по материалам работ 2016-2017 гг. в Ленинградской области