<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>forcemax's</title>
    <link>https://forcemax.tistory.com/</link>
    <description></description>
    <language>ko</language>
    <pubDate>Sun, 14 Jun 2026 12:26:50 +0900</pubDate>
    <generator>TISTORY</generator>
    <ttl>100</ttl>
    <managingEditor>forcemax</managingEditor>
    <item>
      <title>Windows Store에서 앱 업데이트시에 0x80246019 오류가 발생한다면</title>
      <link>https://forcemax.tistory.com/96</link>
      <description>&lt;p&gt;관리자 권한으로 cmd를 실행&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;다음 명령을 한줄씩 복사해서 실행&lt;br /&gt;&lt;span style=&quot;color: #000000;&quot;&gt;taskkill /F /FI &quot;SERVICES eq wuauserv&quot;&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;color: #000000;&quot;&gt;net stop cryptSvc&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;color: #000000;&quot;&gt;net stop bits&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;color: #000000;&quot;&gt;net stop msiserver&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;color: #000000;&quot;&gt;ren C:\Windows\SoftwareDistribution SoftwareDistribution.old&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;color: #000000;&quot;&gt;rmdir C:\Windows\SoftwareDistribution\DataStore&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;color: #000000;&quot;&gt;rmdir C:\Windows\SoftwareDistribution\Download&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>개인</category>
      <author>forcemax</author>
      <guid isPermaLink="true">https://forcemax.tistory.com/96</guid>
      <comments>https://forcemax.tistory.com/96#entry96comment</comments>
      <pubDate>Wed, 30 Oct 2019 20:22:29 +0900</pubDate>
    </item>
    <item>
      <title>fasttext 의 Tag Prediction 따라하기</title>
      <link>https://forcemax.tistory.com/95</link>
      <description>&lt;p&gt;Facebook에서 공개한 &lt;a href=&quot;https://github.com/facebookresearch/fastText&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;fasttext&lt;/a&gt; 를 가지고 놀던중 text classification에 대한 예제는 존재하지만, tag prediction에 대한 예제가 없어서 직접 작성해 보기로 했습니다.&lt;/p&gt;&lt;p&gt;tag prediction에 대한 내용은 다음 논문에서 볼 수&amp;nbsp;있습니다.&lt;/p&gt;&lt;h3 style=&quot;box-sizing: border-box; margin-top: 24px; margin-bottom: 16px; font-size: 1.25em; line-height: 1.25; color: rgb(51, 51, 51); font-family: -apple-system, BlinkMacSystemFont, &amp;quot;Segoe UI&amp;quot;, Helvetica, Arial, sans-serif, &amp;quot;Apple Color Emoji&amp;quot;, &amp;quot;Segoe UI Emoji&amp;quot;, &amp;quot;Segoe UI Symbol&amp;quot;;&quot;&gt;&lt;span style=&quot;font-size: 8pt;&quot;&gt;Bag of Tricks for Efficient Text Classification&lt;/span&gt;&lt;/h3&gt;&lt;p style=&quot;box-sizing: border-box; margin-bottom: 16px; color: rgb(51, 51, 51); font-family: -apple-system, BlinkMacSystemFont, &amp;quot;Segoe UI&amp;quot;, Helvetica, Arial, sans-serif, &amp;quot;Apple Color Emoji&amp;quot;, &amp;quot;Segoe UI Emoji&amp;quot;, &amp;quot;Segoe UI Symbol&amp;quot;; font-size: 16px;&quot;&gt;&lt;span style=&quot;font-size: 8pt;&quot;&gt;[2] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov,&lt;/span&gt;&lt;span style=&quot;font-size: 8pt;&quot;&gt;&amp;nbsp;&lt;/span&gt;&lt;a href=&quot;https://arxiv.org/abs/1607.01759&quot; style=&quot;box-sizing: border-box; background-color: transparent; color: rgb(64, 120, 192);&quot;&gt;&lt;em style=&quot;box-sizing: border-box;&quot;&gt;&lt;span style=&quot;font-size: 8pt;&quot;&gt;Bag of Tricks for Efficient Text Classification&lt;/span&gt;&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&lt;pre style=&quot;box-sizing: border-box; font-family: Consolas, &amp;quot;Liberation Mono&amp;quot;, Menlo, Courier, monospace; font-size: 14px; margin-top: 0px; margin-bottom: 16px; font-stretch: normal; line-height: 1.45; word-wrap: normal; padding: 16px; overflow: auto; background-color: rgb(247, 247, 247); border-radius: 3px; color: rgb(51, 51, 51);&quot;&gt;&lt;code style=&quot;box-sizing: border-box; font-family: Consolas, &amp;quot;Liberation Mono&amp;quot;, Menlo, Courier, monospace; padding: 0px; margin: 0px; background: transparent; border-radius: 3px; word-break: normal; border: 0px; display: inline; overflow: visible; line-height: inherit; word-wrap: normal;&quot;&gt;&lt;span style=&quot;font-size: 8pt;&quot;&gt;@article{joulin2016bag,
  title={Bag of Tricks for Efficient Text Classification},
  author={Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas},
  journal={arXiv preprint arXiv:1607.01759},
  year={2016}
}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;먼저 tag prediction을 하기 위해서 Training&amp;nbsp;데이터를 가져와야하는데, Training&amp;nbsp;데이터는 Yahoo에서 공개한 &lt;a href=&quot;https://webscope.sandbox.yahoo.com/catalog.php?datatype=i&amp;amp;did=67&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;YFCC100M&lt;/a&gt;이라는 1억개 짜리 Flickr 데이터입니다. 먼저 Yahoo에 로그인한 다음에 본인의 AWS 정보를 적어주면 s3cmd 명령어를 통해서 S3에 있는 데이터를 다운 받을 수 있습니다. 이 과정은 생략하겠습니다. 압축된 데이터 총 용량은 14GB입니다.&lt;/p&gt;&lt;p&gt;다운로드가 완료되면 다음과 같은 파일 목록을 확인할 수&amp;nbsp;있습니다.&lt;/p&gt;&lt;div class=&quot;txc-textbox&quot; style=&quot;border-style: double; border-width: 3px; border-color: rgb(193, 193, 193); background-color: rgb(238, 238, 238); padding: 10px;&quot;&gt;&lt;p&gt;&lt;font face=&quot;Menlo, Monaco, Consolas, monospace&quot;&gt;&lt;span style=&quot;font-size: 14.6667px;&quot;&gt;forcemax@forcemax-envy14:~/development/tagpred$ ls -ahl ../yfcc100m/&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font face=&quot;Menlo, Monaco, Consolas, monospace&quot;&gt;&lt;span style=&quot;font-size: 14.6667px;&quot;&gt;합계 17G&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font face=&quot;Menlo, Monaco, Consolas, monospace&quot;&gt;&lt;span style=&quot;font-size: 14.6667px;&quot;&gt;drwxrwxr-x &amp;nbsp;2 forcemax forcemax 4.0K 10월 &amp;nbsp;7 17:57 .&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font face=&quot;Menlo, Monaco, Consolas, monospace&quot;&gt;&lt;span style=&quot;font-size: 14.6667px;&quot;&gt;drwxrwxr-x 19 forcemax forcemax 4.0K 10월 &amp;nbsp;4 17:16 ..&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font face=&quot;Menlo, Monaco, Consolas, monospace&quot;&gt;&lt;span style=&quot;font-size: 14.6667px;&quot;&gt;-rw-rw-r-- &amp;nbsp;1 forcemax forcemax 6.9K &amp;nbsp;9월 30 03:35 WebscopeReadMe.txt&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font face=&quot;Menlo, Monaco, Consolas, monospace&quot;&gt;&lt;span style=&quot;font-size: 14.6667px;&quot;&gt;-rw-rw-r-- &amp;nbsp;1 forcemax forcemax 2.6G &amp;nbsp;9월 30 03:40 yfcc100m_autotags-v1.bz2&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font face=&quot;Menlo, Monaco, Consolas, monospace&quot;&gt;&lt;span style=&quot;font-size: 14.6667px;&quot;&gt;-rw-rw-r-- &amp;nbsp;1 forcemax forcemax 1.1G &amp;nbsp;9월 30 03:35 yfcc100m_dataset-0.bz2&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font face=&quot;Menlo, Monaco, Consolas, monospace&quot;&gt;&lt;span style=&quot;font-size: 14.6667px;&quot;&gt;-rw-rw-r-- &amp;nbsp;1 forcemax forcemax 1.1G &amp;nbsp;9월 30 03:35 yfcc100m_dataset-1.bz2&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font face=&quot;Menlo, Monaco, Consolas, monospace&quot;&gt;&lt;span style=&quot;font-size: 14.6667px;&quot;&gt;-rw-rw-r-- &amp;nbsp;1 forcemax forcemax 1.1G &amp;nbsp;9월 30 03:35 yfcc100m_dataset-2.bz2&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font face=&quot;Menlo, Monaco, Consolas, monospace&quot;&gt;&lt;span style=&quot;font-size: 14.6667px;&quot;&gt;-rw-rw-r-- &amp;nbsp;1 forcemax forcemax 1.1G &amp;nbsp;9월 30 03:35 yfcc100m_dataset-3.bz2&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font face=&quot;Menlo, Monaco, Consolas, monospace&quot;&gt;&lt;span style=&quot;font-size: 14.6667px;&quot;&gt;-rw-rw-r-- &amp;nbsp;1 forcemax forcemax 1.1G &amp;nbsp;9월 30 03:35 yfcc100m_dataset-4.bz2&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font face=&quot;Menlo, Monaco, Consolas, monospace&quot;&gt;&lt;span style=&quot;font-size: 14.6667px;&quot;&gt;-rw-rw-r-- &amp;nbsp;1 forcemax forcemax 1.1G &amp;nbsp;9월 30 03:35 yfcc100m_dataset-5.bz2&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font face=&quot;Menlo, Monaco, Consolas, monospace&quot;&gt;&lt;span style=&quot;font-size: 14.6667px;&quot;&gt;-rw-rw-r-- &amp;nbsp;1 forcemax forcemax 1.4G &amp;nbsp;9월 30 03:35 yfcc100m_dataset-6.bz2&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font face=&quot;Menlo, Monaco, Consolas, monospace&quot;&gt;&lt;span style=&quot;font-size: 14.6667px;&quot;&gt;-rw-rw-r-- &amp;nbsp;1 forcemax forcemax 1.4G &amp;nbsp;9월 30 03:37 yfcc100m_dataset-7.bz2&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font face=&quot;Menlo, Monaco, Consolas, monospace&quot;&gt;&lt;span style=&quot;font-size: 14.6667px;&quot;&gt;-rw-rw-r-- &amp;nbsp;1 forcemax forcemax 1.4G &amp;nbsp;9월 30 03:37 yfcc100m_dataset-8.bz2&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font face=&quot;Menlo, Monaco, Consolas, monospace&quot;&gt;&lt;span style=&quot;font-size: 14.6667px;&quot;&gt;-rw-rw-r-- &amp;nbsp;1 forcemax forcemax 1.4G &amp;nbsp;9월 30 03:37 yfcc100m_dataset-9.bz2&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font face=&quot;Menlo, Monaco, Consolas, monospace&quot;&gt;&lt;span style=&quot;font-size: 14.6667px;&quot;&gt;-rw-rw-r-- &amp;nbsp;1 forcemax forcemax 2.0G &amp;nbsp;9월 30 03:46 yfcc100m_hash.bz2&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;파일 중에서 우리는 yfcc100m_dataset-0에서 yfcc100m_dataset-9까지(이하 dataset 파일)&amp;nbsp;사용할 것입니다.&amp;nbsp;먼저 bzip2 명령으로 압축을 풀어 둡니다. (오래걸립니다)&lt;/p&gt;&lt;div class=&quot;txc-textbox&quot; style=&quot;border-style: double; border-width: 3px; border-color: rgb(193, 193, 193); background-color: rgb(238, 238, 238); padding: 10px;&quot;&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;forcemax@forcemax-envy14:~/development/tagpred$ bzip2 -d ../yfcc100m/*dataset*.bz2&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;논문 내용에 &lt;b&gt;&lt;span style=&quot;color: rgb(0, 0, 0);&quot;&gt;100번 이상 나오지 않는 단어는 제거&lt;/span&gt;&lt;/b&gt;한다고 되어 있으니 이를 위한 작업을 합니다.&lt;/p&gt;&lt;p&gt;먼저 dataset 파일에서 word count를 구해야 하는데, 압축이 해제된 dataset 파일이 총 45GB 입니다. 데이터량이&amp;nbsp;너무 많기때문에 시스템 성능을 최대한 사용하기 위해서 python의&amp;nbsp;multiprocessing 라이브러리를 사용합니다.&lt;/p&gt;&lt;p&gt;참고로 dataset 파일은 tab(\t)로 구분되어 있으며 다음과 같은 필드로 구성되어 있습니다. 이 중에서 우리는 Title, Description, User Tag를 사용할 것입니다.&lt;/p&gt;&lt;div class=&quot;txc-textbox&quot; style=&quot;border-style: double; border-width: 3px; border-color: rgb(203, 203, 203); background-color: rgb(255, 255, 255); padding: 10px;&quot;&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp;* Photo/video identifier&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp;* User NSID&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp;* User nickname&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp;* Date taken&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp;* Date uploaded&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp;* Capture device&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp;* Title&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp;* Description&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp;* User tags (comma-separated)&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp;* Machine tags (comma-separated)&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp;* Longitude&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp;* Latitude&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp;* Accuracy&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp;* Photo/video page URL&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp;* Photo/video download URL&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp;* License name&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp;* License URL&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp;* Photo/video server identifier&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp;* Photo/video farm identifier&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp;* Photo/video secret&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp;* Photo/video secret original&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp;* Extension of the original photo&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp;* Photos/video marker (0 = photo, 1 = video)&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;다음 코드를 보면 Title, Description, User Tag를 파일에서 추출한 다음 내용에 있는 &lt;a href=&quot;https://en.wikipedia.org/wiki/Percent-encoding&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;Percent-encoding&lt;/a&gt; 을 제거하고 User Tag는 Comma로 구분, Title과 Description은 공백으로 구분하여 word count를 계산하도록 하였습니다.&lt;/p&gt;&lt;div class=&quot;txc-textbox&quot; style=&quot;border-style: double; border-width: 3px; border-color: rgb(121, 165, 228); background-color: rgb(219, 232, 251); padding: 10px;&quot;&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;def wordcount_worker(path):&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; print('wordcount worker started : %s' % path)&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; wordcount = collections.Counter()&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; count = 0&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; words = []&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; with open(path) as f:&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; for line in f:&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; count += 1&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; sline = line.split('\t')&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; # user tag&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; words += [k.strip() for k in clean_str(urllib.parse.unquote(sline[8])).replace('+', '_').split(',') if k.strip() != '']&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; # title &amp;amp; description&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; words += [k.strip() for k in clean_str(urllib.parse.unquote_plus(sline[6] + ' ' + sline[7])).split() if k.strip() != '']&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; if count % 100000 == 0:&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; try:&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; words[:] = (v for v in words if v != '')&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; except ValueError:&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; pass&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; wordcount.update(words)&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; words[:] = []&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; if count % 1000000 == 0:&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; print('%s : line %d passed' % (path, count))&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; print('wordcount worker finished : %s' % path)&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; return wordcount&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;위 코드를 multiprocessing 라이브러리를 사용하여 구동시키는데, dataset 파일당 하나의 thread가 동작하도록 하였습니다. 다만 dataset 파일 하나의 크기가 크다보니 메모리 사용량이 상당히 커서 12GB RAM이 장착된 제&amp;nbsp;PC에서는 동시에 2개만 돌리도록 하였습니다.&lt;/p&gt;&lt;div class=&quot;txc-textbox&quot; style=&quot;border-style: double; border-width: 3px; border-color: rgb(121, 165, 228); background-color: rgb(219, 232, 251); padding: 10px;&quot;&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace;&quot;&gt; &lt;span style=&quot;font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/span&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;&amp;nbsp; wordcount = collections.Counter()&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; with Pool(processes = 2) as pool:&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; jobs = pool.imap_unordered(wordcount_worker, files)&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; for res in jobs:&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; wordcount.update(res)&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;전체 dataset 파일에 대한 word count를 구한 후에 100번 이상 발생한 단어만 뽑습니다.&lt;/p&gt;&lt;div class=&quot;txc-textbox&quot; style=&quot;border-style: double; border-width: 3px; border-color: rgb(121, 165, 228); background-color: rgb(219, 232, 251); padding: 10px;&quot;&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace;&quot;&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/span&gt;&lt;span style=&quot;font-size: 9pt; font-family: Menlo, Monaco, Consolas, monospace;&quot;&gt; keepwords = set()&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; for k in wordcount.keys():&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; if wordcount[k] &amp;gt;= &amp;nbsp;100:&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; keepwords.add(k)&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;이제 전체 dataset 파일에서 keepwords에 포함된 단어만 남기고 나머지 단어는 제거한 후에 별도 파일로 저장합니다.&lt;/p&gt;&lt;div class=&quot;txc-textbox&quot; style=&quot;border-style: double; border-width: 3px; border-color: rgb(121, 165, 228); background-color: rgb(219, 232, 251); padding: 10px;&quot;&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;def clean_data(tags, titles, descriptions):&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; string = &quot;&quot;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; for t, ti, desc in zip(tags, titles, descriptions):&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; t_tags = clean_str(urllib.parse.unquote(t)).replace('+', '_').split(',')&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; t_tags = [k.strip() for k in t_tags if k.strip() in keepwords]&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; t_tags = ['__label__'+k for k in t_tags]&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; t_titles = clean_str(urllib.parse.unquote_plus(ti))&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; t_titles = [k.strip() for k in t_titles.split() if k.strip() in keepwords]&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; t_descriptions = clean_str(urllib.parse.unquote_plus(desc))&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; t_descriptions = [k.strip() for k in t_descriptions.split() if k.strip() in keepwords]&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; if len(t_titles) &amp;lt; 1 and len(t_descriptions) &amp;lt; 1:&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; continue&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; if len(t_tags) &amp;lt; 1:&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; continue&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; if len(t_tags) == 1 and t_tags[0] == '__label__':&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; continue&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; string += &quot;%s %s %s\n&quot; % (' '.join(t_tags), ' '.join(t_titles), ' '.join(t_descriptions))&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; return string&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;def clean_worker(path):&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; print(&quot;clean worker started : %s&quot; % path)&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; tags, titles, descriptions = ([] for i in range(3))&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; count = total_count = 0&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; with open(path + '_cleaned', 'w') as w:&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; with open(path) as f:&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; for line in f:&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; count += 1&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; total_count += 1&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; sline = line.split('\t')&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; titles.append(sline[6])&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; descriptions.append(sline[7])&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; tags.append(sline[8])&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; if count == CLEANED_TRAIN_FILE_WRITE_INTERVAL:&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; w.write(&quot;%s&quot; % clean_data(tags, titles, descriptions))&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; print(&quot;%s line processed : %d&quot; % (path, total_count))&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; tags[:], titles[:], descriptions[:] = ([] for i in range(3))&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; count = 0&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; if len(tags) &amp;gt; 0:&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; w.write(&quot;%s&quot; % clean_data(tags, titles, descriptions))&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; print(&quot;clean worker finished : %s&quot; % path)&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;위 코드 역시&amp;nbsp;multiprocessing 라이브러리를 사용하여 구동시키는데, 전체 word count를 구하는 작업보다는 메모리 사용량이 적으므로 최대한 많은 thread를 사용하도록 합니다.&lt;/p&gt;&lt;div class=&quot;txc-textbox&quot; style=&quot;border-style: double; border-width: 3px; border-color: rgb(121, 165, 228); background-color: rgb(219, 232, 251); padding: 10px;&quot;&gt;&lt;p&gt;&lt;font face=&quot;Menlo, Monaco, Consolas, monospace&quot;&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; with Pool(processes=6) as pool:&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font face=&quot;Menlo, Monaco, Consolas, monospace&quot;&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; jobs = pool.imap_unordered(clean_worker, files)&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font face=&quot;Menlo, Monaco, Consolas, monospace&quot;&gt;&lt;br /&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font face=&quot;Menlo, Monaco, Consolas, monospace&quot;&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; for res in jobs:&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font face=&quot;Menlo, Monaco, Consolas, monospace&quot;&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;pass&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;위 내용에 대한 전체 코드는 다음 주소에서 확인할 수 있습니다.&lt;/p&gt;&lt;p&gt;&lt;a href=&quot;https://gist.github.com/forcemax/a6b5885fea859b43763f7712e82d546b&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;https://gist.github.com/forcemax/a6b5885fea859b43763f7712e82d546b&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Intel i7-6700hq, 12GB RAM, SSD를 사용한 시스템에서 실행시키면 약 2시간 가량 소요됩니다.&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;데이터가 준비 되었으니 이제 fasttext를 이용하여 training을 합니다.&lt;/p&gt;&lt;p&gt;먼저 train을 위한 데이터가 아직 10개의 dataset 파일로 분리되어 있으니 하나의 파일로 합쳐줍니다.&lt;/p&gt;&lt;div class=&quot;txc-textbox&quot; style=&quot;border-style: double; border-width: 3px; border-color: rgb(193, 193, 193); background-color: rgb(238, 238, 238); padding: 10px;&quot;&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;forcemax@forcemax-envy14:~/development/tagpred$ ls -alh ../yfcc100m/*_cleaned&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;-rw-rw-r-- 1 forcemax forcemax 799M 10월 13 16:05 ../yfcc100m/yfcc100m_dataset-0_cleaned&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;-rw-rw-r-- 1 forcemax forcemax 799M 10월 13 16:05 ../yfcc100m/yfcc100m_dataset-1_cleaned&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;-rw-rw-r-- 1 forcemax forcemax 799M 10월 13 16:03 ../yfcc100m/yfcc100m_dataset-2_cleaned&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;-rw-rw-r-- 1 forcemax forcemax 798M 10월 13 16:02 ../yfcc100m/yfcc100m_dataset-3_cleaned&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;-rw-rw-r-- 1 forcemax forcemax 798M 10월 13 15:48 ../yfcc100m/yfcc100m_dataset-4_cleaned&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;-rw-rw-r-- 1 forcemax forcemax 799M 10월 13 15:48 ../yfcc100m/yfcc100m_dataset-5_cleaned&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;-rw-rw-r-- 1 forcemax forcemax 1.6G 10월 13 15:51 ../yfcc100m/yfcc100m_dataset-6_cleaned&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;-rw-rw-r-- 1 forcemax forcemax 1.6G 10월 13 15:51 ../yfcc100m/yfcc100m_dataset-7_cleaned&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;-rw-rw-r-- 1 forcemax forcemax 1.6G 10월 13 15:52 ../yfcc100m/yfcc100m_dataset-8_cleaned&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;-rw-rw-r-- 1 forcemax forcemax 1.6G 10월 13 15:52 ../yfcc100m/yfcc100m_dataset-9_cleaned&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;forcemax@forcemax-envy14:~/development/tagpred$ cat ../yfcc100m/*_cleaned &amp;gt; train.txt&lt;/span&gt;&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;forcemax@forcemax-envy14:~/development/tagpred$ ls -alh train.txt&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;-rw-rw-r-- 1 forcemax forcemax 11G 10월 13 16:36 train.txt&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;forcemax@forcemax-envy14:~/development/tagpred$ wc train.txt&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;font face=&quot;Menlo, Monaco, Consolas, monospace&quot;&gt;&lt;span style=&quot;font-size: 14.6667px;&quot;&gt;&amp;nbsp; &amp;nbsp;56251384 &amp;nbsp;1196784449 11608667425 train.txt&lt;/span&gt;&lt;/font&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;논문 내용을 보면 Train set, Validation set, Test set으로 데이터를 구분하는데, 구분하는 기준이 없어서 일단 전체 데이터를 Train Set이라고 생각하고 Training 합니다.&lt;/p&gt;&lt;p&gt;논문에 나온 내용인 hidden unit 200, bigram, epoch 5로 training합니다. 제 PC에서는 loss가&amp;nbsp;negative sampling 외에는 training이 안됩니다. RAM이 더 큰&amp;nbsp;장비에서는 hierarchical softmax를 사용할 수도 있을 것이라 생각됩니다. 그러나, softmax는 테스트 안해보시는게 좋을겁니다. label이 몇십만개 단위이기 때문에 training에 너무 많은 시간이 소요됩니다. (fasttext는 컴파일 해두셨겠죠?)&lt;/p&gt;&lt;div class=&quot;txc-textbox&quot; style=&quot;border-style: double; border-width: 3px; border-color: rgb(193, 193, 193); background-color: rgb(238, 238, 238); padding: 10px;&quot;&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;forcemax@forcemax-envy14:~/development/tagpred$ time ../fastText/fasttext supervised -input train.txt -output yfcc100m -minCount 1 -dim 200 -lr 0.05 -wordNgrams 2 -bucket 10000000 -epoch 5 -loss ns -thread 6&lt;/span&gt;&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;Read 1253M words&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;Number of words: &amp;nbsp;440477&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;Number of labels: 484657&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;Progress: 100.0% &amp;nbsp;words/sec/thread: 711835 &amp;nbsp;lr: 0.000000 &amp;nbsp;loss: 1.112073 &amp;nbsp;eta: 0h0m&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;real&lt;/span&gt;&lt;span class=&quot;Apple-tab-span&quot; style=&quot;white-space: pre; font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;	&lt;/span&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;28m21.726s&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;user&lt;/span&gt;&lt;span class=&quot;Apple-tab-span&quot; style=&quot;white-space: pre; font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;	&lt;/span&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;147m34.340s&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;sys&lt;/span&gt;&lt;span class=&quot;Apple-tab-span&quot; style=&quot;white-space: pre; font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;	&lt;/span&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;2m29.752s&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;논문 내용에는 vocabulary size가 297,141이고 tag가 312,116인데 Train set, Validation set, Test set으로 구분을 안해줘서 더 많은 값이 나왔네요. 제 시스템에서 28분이 넘게 소요되는데, 논문에는 13분 정도 소요됐다고 나옵니다. 논문에서 사용한 시스템은 20 thread를 사용했으니...&lt;/p&gt;&lt;p&gt;최종 loss가 1.112073인데 epoch을 늘려주면 좀 더 줄어듭니다. 논문에 나온 파라미터를 그대로 사용했더니 결과가 저렇게 되네요. 이 부분은 직접 확인해 보시기 바랍니다.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;논문에서 테스트한 값을 직접 넣어서 predict 해보겠습니다.&lt;/p&gt;&lt;div class=&quot;txc-textbox&quot; style=&quot;border-style: double; border-width: 3px; border-color: rgb(193, 193, 193); background-color: rgb(238, 238, 238); padding: 10px;&quot;&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;forcemax@forcemax-envy14:~/development/tagpred$ echo &quot;christmas&quot; | ../fastText/fasttext predict yfcc100m.bin -&amp;nbsp;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-family: Menlo, Monaco, Consolas, monospace; font-size: 11pt;&quot;&gt;__label__christmas&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;여기까지 입니다. 작성된 코드의 성능을 개선하거나 잘못된 부분이 있으면 알려주세요. ^^&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;*** 2016년 10월 18일에 fasttext github에 preprocessed YFCC100M 데이터가 올라왔습니다.&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;해당 데이터는&amp;nbsp;&lt;a href=&quot;https://research.facebook.com/research/fasttext/&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;https://research.facebook.com/research/fasttext/&lt;/a&gt; 페이지에서 다운 받을 수 있습니다. 압축된 상태로&amp;nbsp;7.5GB 이고, 압축을 해제하면 18GB 입니다.&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Train Set, Validation Set, Test Set으로 구분되어 있으므로 바로 fasttext를 이용해서 training 할 수 있습니다.&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;또한, 10월 18일자로 fasttext의 supervised 명령에 minCountLabel 옵션이 추가되었습니다. 논문에 나온 내용인 100번 이상 나오지 않는 단어를 제거하고&amp;nbsp;training 하려면&amp;nbsp;minCountLabel 옵션과 minCount 옵션을 모두 100으로 지정하면 됩니다.&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;</description>
      <category>작업</category>
      <author>forcemax</author>
      <guid isPermaLink="true">https://forcemax.tistory.com/95</guid>
      <comments>https://forcemax.tistory.com/95#entry95comment</comments>
      <pubDate>Wed, 12 Oct 2016 15:52:20 +0900</pubDate>
    </item>
    <item>
      <title>Twitter sample public status rolling top count project for Apache storm starter</title>
      <link>https://forcemax.tistory.com/94</link>
      <description>&lt;p&gt;&lt;span style=&quot;font-size: 18pt;&quot;&gt;Reference&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;1. Storm-starter :&amp;nbsp;&lt;/span&gt;&lt;a href=&quot;https://github.com/apache/storm/tree/master/examples/storm-starter&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;https://github.com/apache/storm/tree/master/examples/storm-starter&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;
&lt;/span&gt;&lt;p&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;2. Storm&lt;/span&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;&amp;nbsp;:&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;&lt;a href=&quot;https://github.com/apache/storm&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;https://github.com/apache/storm&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;
&lt;/span&gt;&lt;p&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;3.&amp;nbsp;Implementing Real-Time Trending Topics With a Distributed Rolling Count Algorithm in Storm :&amp;nbsp;&lt;/span&gt;&lt;a href=&quot;http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;font-size: 14pt; &quot;&gt;&lt;b&gt;&lt;span style=&quot; font-size: 18pt;&quot;&gt;소개&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;회사 업무로 Apache Storm(이하 Storm)과 관련된 프로젝트를 완료하고 휴식중에, Storm 기초를 설명할 수 있을만한 예제를 만들어보기 위해서 Storm Starter를 참조하여 간단한 프로젝트를 만들었다.&lt;/span&gt;&lt;/p&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;
&lt;/span&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt; &quot;&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;이 프로젝트는 &lt;/span&gt;&lt;a href=&quot;https://dev.twitter.com/streaming/reference/get/statuses/sample&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;Twitter Sample Public&amp;nbsp;Status API(&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;span style=&quot;font-size: 13.3333330154419px; line-height: 20px;&quot;&gt;&lt;a href=&quot;https://dev.twitter.com/streaming/reference/get/statuses/sample&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;https://dev.twitter.com/streaming/reference/get/statuses/sample)&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;span style=&quot;font-size: 9pt; line-height: 1.5;&quot;&gt;를 사용하여 Twitter&lt;/span&gt;&lt;span style=&quot; font-size: 10pt; line-height: 1.5;&quot;&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;&amp;nbsp;realtime stream data의 일부를 Input으로 하고, HashTag 정보를 추출한 후 일정 시간 간격(emit f&lt;/span&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;requency)으로 일정 시간 동안(window length)의 일정 갯수(TOP_N)의&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;Top HashTag&lt;/span&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;를 생성하여 출력하는 프로젝트이다.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;
&lt;/span&gt;&lt;p&gt;&lt;span style=&quot; font-size: 10pt; line-height: 1.5;&quot;&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;Storm-starter project에서 많은 소스코드를 가져 왔으며&lt;/span&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;&amp;nbsp;Twitter Library는 Twitter4J를 사용한다.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;
&lt;/span&gt;&lt;p&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;Project source :&amp;nbsp;&lt;/span&gt;&lt;a href=&quot;https://github.com/forcemax/storm_twitter_hashtag&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;https://github.com/forcemax/storm_twitter_hashtag&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;
&lt;/span&gt;&lt;p&gt;&lt;span style=&quot; font-size: 10pt; line-height: 1.5;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;font face=&quot;Verdana&quot;&gt;&lt;span style=&quot;font-size: 14pt; line-height: 20px;&quot;&gt;&lt;b&gt;&lt;span style=&quot;font-size: 18pt;&quot;&gt;실행하기&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;font-size: 9pt; &quot;&gt;0.&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;font-size: 9pt; line-height: 20px; &quot;&gt;Prerequisites&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;font-size: 9pt; line-height: 20px; &quot;&gt;Java 1.7 이상,&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;font-size: 9pt; line-height: 20px; &quot;&gt;Storm 0.9.5,&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;font-size: 9pt; line-height: 20px; &quot;&gt;Maven&lt;/span&gt;&lt;span style=&quot;font-size: 9pt; line-height: 20px; &quot;&gt;, Git&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;font-size: 9pt; line-height: 20px; &quot;&gt;(Twitter API를 사용하기 위한 consumerKey, consumerSecret, accessToken, accessTokenSecret을 변경하지 않으면 실행이 안된다.)&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;font face=&quot;Verdana&quot;&gt;&lt;span style=&quot;font-size: 14pt; line-height: 20px;&quot;&gt;&lt;span style=&quot; font-size: 9pt;&quot;&gt;1. 소스 가져오기&lt;/span&gt;&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;div class=&quot;txc-textbox&quot; style=&quot;border: 1px solid rgb(219, 232, 251); padding: 10px; background-color: rgb(219, 232, 251);&quot;&gt;&lt;p&gt;&lt;span style=&quot;font-family: 'Courier New';&quot;&gt;&lt;span style=&quot;font-size: 9pt; font-family: 'Courier New';&quot;&gt;$&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;font-size: 9pt; font-family: 'Courier New';&quot;&gt;git clone&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;font face=&quot;Courier New&quot;&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;https://github.com/forcemax/storm_twitter_hashtag&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;&lt;span style=&quot;font-size: 14pt;&quot;&gt;&lt;span style=&quot; font-size: 9pt;&quot;&gt;2. 소스 빌드하기&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;div class=&quot;txc-textbox&quot; style=&quot;border: 1px solid rgb(219, 232, 251); padding: 10px; background-color: rgb(219, 232, 251);&quot;&gt;&lt;p&gt;&lt;span style=&quot;font-family: 'Courier New'; font-size: 9pt;&quot;&gt;$ mvn clean package&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;&lt;span style=&quot; font-size: 9pt;&quot;&gt;3. Storm Cluster에 Topology submit&lt;/span&gt;&lt;/p&gt;&lt;div class=&quot;txc-textbox&quot; style=&quot;border: 1px solid rgb(219, 232, 251); padding: 10px; background-color: rgb(219, 232, 251);&quot;&gt;&lt;p&gt;&lt;span style=&quot;font-family: 'Courier New'; font-size: 9pt;&quot;&gt;$ &lt;/span&gt;&lt;font face=&quot;Courier New&quot;&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;storm jar&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;StormTwitterHashtag-0.0.1-SNAPSHOT-jar-with-dependencies.jar&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;com.embian.forcemax.twitter.&lt;/span&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;StormTwitterHashtagTopologyRunner&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;server&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;&lt;span style=&quot;font-size: 9pt; &quot;&gt;4. Storm UI에서 확인하기&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center; clear: none; float: none;&quot;&gt;&lt;span class=&quot;imageblock&quot; style=&quot;display: inline-block; width: 546px;  height: auto; max-width: 100%;&quot;&gt;&lt;img src=&quot;https://t1.daumcdn.net/cfile/tistory/2108054C55AC868917&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Ft1.daumcdn.net%2Fcfile%2Ftistory%2F2108054C55AC868917&quot; width=&quot;546&quot; height=&quot;261&quot; filename=&quot;storm-ui-1.png&quot; filemime=&quot;image/jpeg&quot;/&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;span style=&quot; font-size: 9pt;&quot;&gt;그림. topology 목록&lt;/span&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center; clear: none; float: none;&quot;&gt;&lt;span class=&quot;imageblock&quot; style=&quot;display: inline-block; width: 546px;  height: auto; max-width: 100%;&quot;&gt;&lt;img src=&quot;https://t1.daumcdn.net/cfile/tistory/21091A4C55AC868A16&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Ft1.daumcdn.net%2Fcfile%2Ftistory%2F21091A4C55AC868A16&quot; width=&quot;546&quot; height=&quot;260&quot; filename=&quot;storm-ui-2.png&quot; filemime=&quot;image/jpeg&quot;/&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;span style=&quot; font-size: 9pt;&quot;&gt;그림. topology 상세 정보&lt;/span&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;&lt;span style=&quot;font-size: 18pt; &quot;&gt;설명&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot; font-size: 9pt;&quot;&gt;- Rolling Count Algorithm은 Reference 3번 사이트를 참조&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot; font-size: 9pt;&quot;&gt;- Topology는 다음과 같이 구성되어 있다. Twitter API를 사용하기 위해서는 Twitter에 App 등록을 해야하며, App 등록을 하면 consumerKey, consumerSecret, accessToken, accessTokenSecret 값을 얻을 수 있다. 다음 코드에 해당 값을 넣어서 사용한다.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot; font-size: 9pt;&quot;&gt;&lt;script src=&quot;https://gist.github.com/forcemax/2a39b3f11bd424840b34.js&quot;&gt;&lt;/script&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;line-height: 1;&quot;&gt;&lt;span class=&quot;imageblock&quot; style=&quot;display: inline-block; width: 546px; text-align: center; background-color: rgb(238, 238, 238);; height: auto; max-width: 100%;&quot;&gt;&lt;img src=&quot;https://t1.daumcdn.net/cfile/tistory/22038B4755AC8C5004&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Ft1.daumcdn.net%2Fcfile%2Ftistory%2F22038B4755AC8C5004&quot; width=&quot;546&quot; height=&quot;222&quot; filename=&quot;storm-ui-topology.png&quot; filemime=&quot;image/jpeg&quot; style=&quot;text-align: center; background-color: rgb(238, 238, 238);&quot;/&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;span style=&quot; font-size: 9pt;&quot;&gt;그림. Topology 구성도&lt;/span&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;font face=&quot;Verdana&quot;&gt;&lt;span style=&quot;font-size: 13.3333330154419px; line-height: 20px;&quot;&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;- TwitterSpout은 Twitter4J Library를 사용하며, LinkedBlockingQueue를 사용해서 새로운 &lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;Public Status가 있을때 Queue에 저장&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;한다. &lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;nextTuple() 호출시에 Queue에서 꺼내&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;서 ExtractHashTagBolt에 넘긴다.&lt;/span&gt;&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;
&lt;p&gt;&lt;font face=&quot;Verdana&quot;&gt;&lt;span style=&quot;font-size: 13.3333330154419px; line-height: 20px;&quot;&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;&lt;script src=&quot;https://gist.github.com/forcemax/313b669aa72746d2559a.js&quot;&gt;&lt;/script&gt;&lt;/span&gt;&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;
&lt;p style=&quot;line-height: 1;&quot;&gt;&lt;font face=&quot;Verdana&quot;&gt;&lt;span style=&quot;font-size: 13.3333330154419px; line-height: 20px;&quot;&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;- ExtractHashTagBolt는 받은 Public Status에서 HashTag만 뽑아내서, RollingCountBolt로 넘긴다. &lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;HashTag의 갯수 만큼 emit&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;이 발생한다.&lt;/span&gt;&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;
&lt;p style=&quot;line-height: 1;&quot;&gt;&lt;span style=&quot; font-size: 9pt;&quot;&gt;&lt;script src=&quot;https://gist.github.com/forcemax/72b6b8a4cccd5681de1c.js&quot;&gt;&lt;/script&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;line-height: 1;&quot;&gt;&lt;span style=&quot; font-size: 9pt;&quot;&gt;- RollingCountBolt는 생성할 때 인자로 받은&amp;nbsp;&lt;/span&gt;&lt;span style=&quot; font-size: 13.3333330154419px; line-height: 20px;&quot;&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;window length와 emit frequency 값을 바탕으로, &lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;emit frequency마다 window length에 속하는 데이터에서 word별 count를 계산&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;해서 IntermediateRankingsBolt로 넘긴다. 이때, emit frequency마다 emit을 하기 위해서 &lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;TickTuple&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;을 사용하는데, &lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;TickTuple&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;은 Storm 0.8에&amp;nbsp;새로 들어간 기능이며 &lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;Component(Spout, Bolt)내에서 일정 주기 마다 Tuple을 발생&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;시키는 기능이다.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;line-height: 1;&quot;&gt;&lt;span style=&quot; font-size: 13.3333330154419px; line-height: 20px;&quot;&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;&lt;script src=&quot;https://gist.github.com/forcemax/3a76166277bc90941d4d.js&quot;&gt;&lt;/script&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;line-height: 1;&quot;&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;-&amp;nbsp;&lt;/span&gt;&lt;span style=&quot; font-size: 13.3333330154419px; line-height: 20px;&quot;&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;IntermediateRankingsBolt와 TotalRankingsBolt는 생성할때 인자로 topN, emit frequency를 받으며, 입력된 &lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;word별 count를 바탕으로 상위 topN개의 word와 count를 뽑아내고 emit frequency 마다 emit&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 9pt;&quot;&gt;한다. IntermediateRankingsBolt는 parallelism hint를 크게 주어 map-reduce 구조에서 map의 역할을 하고, TotalRankingsBolt는 parallelism hint를 1로 주고&amp;nbsp;reduce의 역할을 한다.&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot; font-size: 9pt; line-height: 20px;&quot;&gt;emit frequency마다 emit을 하기 위해서 RollingCountBolt와 마찬가지로 TickTuple을 사용한다.&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;line-height: 1;&quot;&gt;&lt;span style=&quot; font-size: 9pt; line-height: 20px;&quot;&gt;&lt;script src=&quot;https://gist.github.com/forcemax/5c976115be31f5e54583.js&quot;&gt;&lt;/script&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;line-height: 1;&quot;&gt;&lt;span style=&quot; font-size: 9pt;&quot;&gt;- 마지막으로 PrinterBolt는 TotalRankingsBolt에서 emit한 Tuple을 출력하기 위해서 사용하며 특별한 기능은 없다.&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;line-height: 1;&quot;&gt;&lt;span style=&quot; font-size: 9pt;&quot;&gt;&lt;script src=&quot;https://gist.github.com/forcemax/653032f56072f0000289.js&quot;&gt;&lt;/script&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;line-height: 1;&quot;&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot; font-size: 9pt;&quot;&gt;이 프로젝트&lt;/span&gt;&lt;span style=&quot; font-size: 9pt;&quot;&gt;는 Storm의&lt;/span&gt;&lt;span style=&quot; font-size: 9pt;&quot;&gt;&amp;nbsp;Spout(nextTuple과 open), Bolt(execute, prepare)와 Topology wiring에 대한 이해만 있다면 코드를 보는데 아무 무리가 없을 정도로 간단한 예제이다. 그러나 외부 서비스(Twitter)과&lt;/span&gt;&lt;span style=&quot; font-size: 9pt;&quot;&gt;의 연계를 통한 S&lt;/span&gt;&lt;span style=&quot; font-size: 9pt;&quot;&gt;pout 구성, Atomic하게 역할을 분리한 Bolt, parallelism hint를 조절하여 성능을 향상시키는 방법을 확인해 보기에 알맞은 예제이다.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot; font-size: 10pt;&quot;&gt;&lt;span style=&quot; font-size: 9pt;&quot;&gt;Twitter Sample Public Status API는 10분에 20000 statuses&amp;nbsp;&lt;/span&gt;&lt;span style=&quot; font-size: 9pt;&quot;&gt;정도의 데이터밖에 제공하지 않으므로, 한대의 Storm에서 처리하기에 충분하다.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot; font-size: 9pt;&quot;&gt;초보자가 알아보기 쉽게 코드가 구성되어 있으니, Storm을 이용하여 realtime CEP 엔진을 공부하려는 분들에게 많은 도움이 되었으면 한다.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;</description>
      <category>작업</category>
      <category>Apache Storm</category>
      <category>storm-starter</category>
      <category>Twitter</category>
      <category>twitter4j</category>
      <author>forcemax</author>
      <guid isPermaLink="true">https://forcemax.tistory.com/94</guid>
      <comments>https://forcemax.tistory.com/94#entry94comment</comments>
      <pubDate>Mon, 20 Jul 2015 13:58:36 +0900</pubDate>
    </item>
    <item>
      <title>Apache Storm Cluster Fault Tolerance</title>
      <link>https://forcemax.tistory.com/93</link>
      <description>&lt;p&gt;&lt;span style=&quot;font-size: 10pt; line-height: 1.5;&quot;&gt;Reference :&amp;nbsp;&lt;/span&gt;&lt;a href=&quot;https://storm.apache.org/documentation/Fault-tolerance.html&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot; style=&quot;font-size: 9pt; line-height: 1.5;&quot;&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;https://storm.apache.org/documentation/Fault-tolerance.html&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Apache Storm(이하 Storm)&amp;nbsp;Cluster를 Production 단계에서 사용하기 위해 고려하다보면 Nimbus의 &lt;/span&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Single_point_of_failure&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;SPoF(Single Point of Failure)&lt;/span&gt;&lt;/a&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt; 여부에 대해서 고민하지 않을 수 없다. 자세하게 Storm Cluster의 Fault Tolerance에 대해서 살펴보면 다음과 같다.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;1. Worker가 Down되면?&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Worker는 Supervisor에 의해서 관리되며 Worker가 Down될 경우 &lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Supervisor가 Worker를 재기동&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;(restart)한다. 그러나 Worker가 기동되는&amp;nbsp;단계에서 지속적으로 문제가 발생해서 기동에 실패하면, Nimbus에 의해서 다른 Supervisor에게 넘긴다.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;2. Node가 Down되면?&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;여기서 Node는 장비(Machine)을 칭하며, Supervisor가 구동중인 장비로 규정한다. Node가 Down될 경우 Nimbus가 이를 감지하고 해당 Supervisor에서 동작된던 &lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;task를&amp;nbsp;다른 Supervisor로 넘긴다.&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt; 이 경우 다른 Supervisor에서 Worker가 새로 기동될&amp;nbsp;수도 있고, 현재 기동되어 있는 Worker에 task를 할당하기도 한다.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;3. Nimbus 또는&amp;nbsp;Supervisor가 Down되면?&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Nimbus와 Supervisor는 Storm에서 제공하는 문서에&amp;nbsp;의하면 &lt;/span&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Fail-fast&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Fail-Fast&lt;/span&gt;&lt;/a&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;&amp;nbsp;&amp;amp; Stateless 디자인이다. &lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;monit이나 daemontools와 같은 툴을 사용해서 process가 종료되면 자동으로 재기동&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt; 되도록 구성하는 것이 좋다. 좋은 정도가 아니라 꼭 이렇게 하라고 한다. &lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;상태 정보가 Zookeeper 또는 디스크에 저장되어 있으므로 재기동시에 이를 읽어들여 기존 상태와 같이 구성한다.&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;4. Nimbus는 SPoF인가?&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Nimbus node가 Down되더라도 다른 Node에서 동작중이던 Worker는 영향이 없다. 만약 다른 Node에서 동작중이던 Worker가&amp;nbsp;Down되더라도 해당 Node에서 동작중인 Supervisor에 의해서 재기동된다. 그러나 Worker가&amp;nbsp;재기동될 때 지속적으로 문제가 발생해서 기동에 실패한다면, 해당 Worker는 Down상태로 남는다.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;그러므로 &quot;Nimbus는 SPoF인가?&quot;라는 물음에 대답은 &quot;&lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Nimbus는 일종의 SPoF로 볼 수 있다&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;.&quot;이다. 그러나,&amp;nbsp;실제로 Nimbus node가 Down되어 있는 중에 비극적인 일(Worker가 막 Down된다던지...)만 발생하지 않는다면 큰 문제는 아니다. 향후 Nimbus의 HA를 지원할 계획은 갖고 있다.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;결과적으로,&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Nimbus Node(Nimbus process가 아니라)가 Down 되더라도&amp;nbsp;다른 Node(Supervisor Node)들은 &lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;정상적으로 Topology를 실행하고 있는 상태&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;이다. 그러나, 정상적인 상태로 복구하기 위해서는 Nimbus Node를 복구해야한다.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;여기서 Nimbus Node를 복구하는 방법에 두가지 경우가 있을 수 있다.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;1. &lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Nimbus Node가 상태 정보&amp;nbsp;데이터(in Disk)를 가지고 복구되는 경우&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt; : Nimbus가 상태 정보를 디스크에서 읽어 들일 수 있으므로, &lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;기존 상태 정보를 가지고 기동&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;된다.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;2. &lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Nimbus Node에 상태 정보 데이터&lt;/span&gt;&lt;/b&gt;&lt;b style=&quot;font-size: 9pt; line-height: 1.5;&quot;&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;(in Disk)&lt;/span&gt;&lt;/b&gt;&lt;b style=&quot;font-size: 9pt; line-height: 1.5;&quot;&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;를 가지고 복구 되지 않는 경우&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 9pt; line-height: 1.5;&quot;&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt; : 기존 정보가 없으므로 &lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Storm Cluster가 새로 기동되는 상태&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;이다. 이 경우 &lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;기존 동작중인 Supervisor가 새로운 Nimbus에 연결되면서 기존 Worker를 제거&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;한다. 그러므로, &lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Topology를 다시 Submit&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;해야 한다.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;PS. Storm 1.0.0 부터는 HA Nimbus가 지원됩니다~~&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;</description>
      <category>작업</category>
      <category>Apache Storm</category>
      <category>Fault Tolerance</category>
      <category>Nimbus</category>
      <category>spof</category>
      <author>forcemax</author>
      <guid isPermaLink="true">https://forcemax.tistory.com/93</guid>
      <comments>https://forcemax.tistory.com/93#entry93comment</comments>
      <pubDate>Tue, 14 Jul 2015 18:02:33 +0900</pubDate>
    </item>
    <item>
      <title>Dockerizing Embian eFolder Server Application</title>
      <link>https://forcemax.tistory.com/92</link>
      <description>&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;최근 이슈가 되고 있는 &lt;/span&gt;&lt;a href=&quot;http://www.docker.com/&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Docker&lt;/span&gt;&lt;/a&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;를 제대로 한번 사용해보고 경험해보고 싶어서 가장 많이 만져봤던 &lt;/span&gt;&lt;a href=&quot;http://efolder.embian.com/&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Embian&amp;nbsp;&lt;/span&gt;&lt;/a&gt;&lt;a href=&quot;http://efolder.embian.com/&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;eFolder&lt;/span&gt;&lt;/a&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;(이하 eFolder) Server를 Dockerize 하기로 하였다. eFolder Server를 초보가 설치하는데 하루가 넘게 걸리는 것을 보면 Dockerize할 대상으로 적합하다고 생각하였다.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Docker에 대한 기본적인 설명은 생략한다. (&lt;/span&gt;&lt;a href=&quot;http://blog.nacyot.com/articles/2014-01-27-easy-deploy-with-docker/&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;nacyot님의&amp;nbsp;도커(Docker) 튜토리얼 : 깐 김에 배포까지&lt;/span&gt;&lt;/a&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;)&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;작업환경 : Ubuntu 14.04 LTS&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 12pt; line-height: 1.5;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 12pt; line-height: 1.5;&quot;&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt; line-height: 1.5;&quot;&gt;1.&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Docker 설치&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Docker 설치는 docker 사이트에 자세하게 나와있으니 생략.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;a href=&quot;https://docs.docker.com/installation/#installation&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Docker Documentation Installation Guide&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 12pt; line-height: 1.5;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 12pt; line-height: 1.5;&quot;&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;2. Dockerfile 작성&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Dockerfile : Docker image를 build할때 어떠한 내용으로 bulid할 것인지 정의해 놓은 파일.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;a href=&quot;https://github.com/forcemax/efolder-docker/blob/master/Dockerfile&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;https://github.com/forcemax/efolder-docker/blob/master/Dockerfile&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class=&quot;txc-textbox&quot; style=&quot;border: 3px double rgb(193, 193, 193); padding: 10px; background-color: rgb(238, 238, 238);&quot;&gt;&lt;pre style=&quot;color: rgb(0, 0, 0); line-height: normal;&quot;&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;#
# eFolder Dockerfile
#

# Pull base image.
FROM ubuntu:14.04
MAINTAINER Jae-cheol Kim &amp;lt;forcemax@gmail.com&amp;gt;

# Install apache, mysql, mod-perl, php5
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update \
	&amp;amp;&amp;amp; apt-get install -y --no-install-recommends \
		apache2 \
		apache2-data \
	&amp;amp;&amp;amp; apt-get install -y --no-install-recommends \
		libapache2-mod-perl2 \
		libncurses5-dev \
		libdbi-perl \
		libtext-iconv-perl \
		libtimedate-perl \
		libdate-calc-perl \
		libdbd-mysql-perl \
		libnet-dns-perl \
		libmime-lite-perl \
		libossp-uuid-perl \
		libemail-address-perl \
		libmailtools-perl \
		libsoap-lite-perl \
		libsphinx-search-perl \
	&amp;amp;&amp;amp; apt-get install -y --no-install-recommends \
		libapache2-mod-php5 \
		php5-mysql \
	&amp;amp;&amp;amp; apt-get install -y --no-install-recommends \
		git \
	&amp;amp;&amp;amp; apt-get install -y --no-install-recommends \
		mysql-server \
	&amp;amp;&amp;amp; apt-get install -y --no-install-recommends \
		sphinxsearch \
	&amp;amp;&amp;amp; apt-get install -y --no-install-recommends \
		supervisor \
	&amp;amp;&amp;amp; rm -r /var/lib/apt/lists/*

RUN mkdir -p /var/log/supervisor

RUN git clone https://github.com/forcemax/efolder /app
RUN rm -rf /app/.git 

COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf
COPY 000-default.conf /etc/apache2/sites-available/000-default.conf
COPY 00.CONFIG /app/etc/00.CONFIG
COPY FILE_crawl.pl /app/etc/FILE_crawl.pl
RUN chmod a+x /app/etc/FILE_crawl.pl
COPY makeSphinxIndex.sh /app/etc/makeSphinxIndex.sh
RUN chmod a+x /app/etc/makeSphinxIndex.sh
COPY sphinx.conf /app/etc/sphinx.conf
COPY searchd.sh /app/etc/searchd.sh
COPY ddns.sh /app/etc/ddns.sh
RUN chmod a+x /app/etc/ddns.sh
COPY init_db.sh /app/doc/db/init_db.sh
RUN chmod a+x /app/doc/db/init_db.sh
COPY setup.php /app/www/eFolderAdmin/Config/setup.php
COPY EmbianSoapHandler.pm /app/src/EmbianSoap/EmbianSoapHandler.pm
COPY CONFIG.pm /app/src/FTPService/lib/perl/eFolder/CONFIG.pm

RUN echo &quot;&quot; &amp;gt;&amp;gt; /etc/crontab
RUN echo &quot;36 4 * * * root ( cd /app/etc ; bash makeSphinxIndex.sh all 2&amp;gt; /dev/null &amp;gt; /dev/null )&quot; &amp;gt;&amp;gt; /etc/crontab
RUN echo &quot;* * * * * root ( cd /app/etc ; bash makeSphinxIndex.sh delta 2&amp;gt; /dev/null &amp;gt; /dev/null )&quot; &amp;gt;&amp;gt; /etc/crontab

RUN ln -fs /usr/share/zoneinfo/Asia/Seoul /etc/localtime

VOLUME [&quot;/eFolder&quot;, &quot;/var/lib/mysql&quot;]

EXPOSE 80
CMD [&quot;/usr/bin/supervisord&quot;]&lt;/span&gt;&lt;/p&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;b style=&quot;font-size: 9pt; line-height: 1.5;&quot;&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;FROM&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 10pt; line-height: 1.5;&quot;&gt; 에서 어떤 base image를 사용할 것인지&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;MAINTAINER&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt; 에서 누가 작성한 것인지&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;CMD&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt; 에서 image를 실행할 때&amp;nbsp;어떤 명령을 실행할 것인지&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;ENV, RUN, COPY &lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;를 조합하여 image생성시에 어떠한 내용이 담길 것인지 지정한다.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;eFolder Server는 apache2 + mod_perl + mod_php 를 사용하여 WEB, mysql을 사용하여 DB, 그리고 sphinxsearch를 사용하여 fulltext search를 조합하여 운영된다. 일반적으로 Docker는 하나의 process만 실행시킬 것을 권장하고 있으나 여러&amp;nbsp;daemon을 실행하기 위해서 supervisord를 사용하였다.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;a href=&quot;https://raw.githubusercontent.com/forcemax/efolder-docker/master/supervisord.conf&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;https://github.com/forcemax/efolder-docker/blob/master/supervisord.conf&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class=&quot;txc-textbox&quot; style=&quot;border: 3px double rgb(193, 193, 193); padding: 10px; background-color: rgb(238, 238, 238);&quot;&gt;&lt;pre style=&quot;color: rgb(0, 0, 0); line-height: normal;&quot;&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;[supervisord]
nodaemon=true

[program:mysqld]
command=/bin/bash -c &quot;/usr/bin/mysqld_safe&quot;

[program:cron]
command=/bin/bash -c &quot;/usr/sbin/cron -f&quot;

[program:searchd]
command=/bin/bash /app/etc/searchd.sh
startsecs=0

[program:ddns]
command=/bin/bash /app/etc/ddns.sh
startsecs=0

[program:apache2]&lt;/span&gt;&lt;/pre&gt;&lt;pre style=&quot;color: rgb(0, 0, 0); line-height: normal;&quot;&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;command=/bin/bash -c &quot;source /etc/apache2/envvars &amp;amp;&amp;amp; exec /usr/sbin/apache2 -DFOREGROUND&quot;&amp;nbsp;&lt;/span&gt;&lt;/p&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 12pt;&quot;&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;3. DB 및 파일 데이터 저장소&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;eFolder는 웹하드 서비스를 위해서 만들어졌으니 당연히 파일 데이터가 저장된다. 또한 fulltext search 및 사용자 관리를 위해서 DB를 사용하니 DB 데이터 역시 저장된다.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Docker에서&amp;nbsp;&lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;container는 stateless&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;이다.&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;font-size: 9pt; line-height: 1.5;&quot;&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;그렇다면 &lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;&quot;서버가 reboot되면&amp;nbsp;DB 및 파일 데이터는 사라지는데?&quot; &lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;또는 &lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;&quot;image를 수정해서 다시 실행하면 어떡하지?&quot;&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;이런 &lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Persistent Data&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;에 대한 필요성 때문에&amp;nbsp;Docker에서는 &lt;/span&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Data Volume Container&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;라는 concept을 내놓았다.&amp;nbsp;Data Volume&amp;nbsp;Container는 실행 상태의&amp;nbsp;container가 아니며, 다른&amp;nbsp;container에서 Data Volume Container의 volume을 mount해서 사용하는 것이라 생각하면 된다.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;- Data Volume Container 생성&lt;/span&gt;&lt;/p&gt;&lt;div class=&quot;txc-textbox&quot; style=&quot;border: 3px double rgb(193, 193, 193); padding: 10px; background-color: rgb(238, 238, 238);&quot;&gt;&lt;p&gt;&lt;font color=&quot;#000000&quot; face=&quot;monospace&quot;&gt;&lt;span style=&quot;line-height: normal; white-space: pre;&quot;&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;sudo docker run -i -t --name &lt;/span&gt;&lt;b&gt;&lt;u&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;efolder_mysql_data&lt;/span&gt;&lt;/u&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt; -v /var/lib/mysql -v /eFolder busybox /bin/sh&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;- eFolder Container에서 Data Volume Container 사용&lt;/span&gt;&lt;/p&gt;&lt;div class=&quot;txc-textbox&quot; style=&quot;border: 3px double rgb(193, 193, 193); padding: 10px; background-color: rgb(238, 238, 238);&quot;&gt;&lt;p&gt;&lt;span style=&quot;color: rgb(0, 0, 0); font-family: monospace; line-height: normal; white-space: pre; font-size: 10pt;&quot;&gt;sudo&lt;/span&gt;&lt;font color=&quot;#000000&quot; face=&quot;monospace&quot;&gt;&lt;span style=&quot;line-height: normal; white-space: pre;&quot;&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt; docker run -d -p 80:80 -e HOSTIPADDR=$HOSTIPADDR &lt;/span&gt;&lt;b&gt;&lt;u&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;--volumes-from efolder_mysql_data&lt;/span&gt;&lt;/u&gt;&lt;/b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt; forcemax/efolder:latest&lt;/span&gt;&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 12pt;&quot;&gt;&lt;b&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;4. Github 및 Docker Hub에 등록&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;eFolder 및 eFolder Dockerize 관련 내용은 모두 Github에 등록하였으며, 아무나 사용할 수 있도록 public으로 공개되어 있다.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;a href=&quot;https://github.com/forcemax/efolder&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Github eFolder Repository&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href=&quot;https://github.com/forcemax/efolder-docker&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Github eFolder Docker Repository&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Docker Hub 에도 Repository를 생성하였으며, image를 automated build하도록 설정하였다. 참고로 eFolder는 Docker Hub에서 build하는데 약 15분이 필요하다.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;a href=&quot;https://registry.hub.docker.com/u/forcemax/efolder/&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;Docker Hub eFolder Repository&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;다음 명령으로 image를 build하지 않고 바로 다운로드 받아서 사용할 수 있다.&lt;/span&gt;&lt;/p&gt;&lt;div class=&quot;txc-textbox&quot; style=&quot;border: 3px double rgb(193, 193, 193); padding: 10px; background-color: rgb(238, 238, 238);&quot;&gt;&lt;p&gt;&lt;font color=&quot;#000000&quot; face=&quot;monospace&quot;&gt;&lt;span style=&quot;line-height: normal; white-space: pre; font-size: 10pt;&quot;&gt;sudo docker pull forcemax/efolder:latest&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;모든 작업을 마치고..&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;기존 서버들이 Ubuntu 12.04 LTS로 운영중이라&amp;nbsp;Ubuntu 14.04 LTS 에서 eFolder를 테스트하지&amp;nbsp;못했는데, Ubuntu 14.04 LTS에서 &lt;/span&gt;&lt;a href=&quot;http://search.cpan.org/~gaas/HTTP-Message-6.06/lib/HTTP/Message.pm&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;HTTP::Message Perl Module&lt;/span&gt;&lt;/a&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt; 버전이 올라가면서 기존 코드가 에러를 내서 코드 수정까지 진행해야 했다.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 10pt;&quot;&gt;eFolder Server 설치를 편하게 할 수 있도록 하는 것이 우선이라 한 image에 모든 daemon을 운영하도록 하였으나, Scale-out을 고려하여 daemon별로 나누는 것도 고려해 볼만 하다.&lt;/span&gt;&lt;/p&gt;</description>
      <category>작업</category>
      <category>docker</category>
      <category>Embian eFolder</category>
      <author>forcemax</author>
      <guid isPermaLink="true">https://forcemax.tistory.com/92</guid>
      <comments>https://forcemax.tistory.com/92#entry92comment</comments>
      <pubDate>Tue, 23 Dec 2014 16:56:57 +0900</pubDate>
    </item>
    <item>
      <title>JDK 7u40 VisualVM 사용을 위한 jstatd 설정</title>
      <link>https://forcemax.tistory.com/91</link>
      <description>&lt;p&gt;서버에 JVM에서 도는 프로그램의 메모리 사용량을 모니터링 할 일이 생겨서 관련 내용을 검색하던 중 VisualVM을 찾았고 이를 활용하기 위해서 jstatd를 서버에 구동해야 한다는 것을 확인하고 설정했다.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;여러 자료를 확인해본 결과 다들 아래와 같이만 하면 된다고 했다.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style=&quot;font-size: 9pt; line-height: 1.5;&quot;&gt;~/jstatd.all.policy&lt;/span&gt;&lt;/p&gt;&lt;div class=&quot;txc-textbox&quot; style=&quot;border: 3px double rgb(203, 203, 203); background-color: rgb(255, 255, 255); padding: 10px;&quot;&gt;&lt;p&gt;grant codebase &quot;file:${java.home}/../lib/tools.jar&quot; {&lt;/p&gt;&lt;p&gt;&amp;nbsp; &amp;nbsp;permission java.security.AllPermission;&lt;/p&gt;&lt;p&gt;};&lt;/p&gt;&lt;/div&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;~/jstatd_run.sh&lt;/p&gt;&lt;div class=&quot;txc-textbox&quot; style=&quot;border: 3px double rgb(203, 203, 203); background-color: rgb(255, 255, 255); padding: 10px;&quot;&gt;&lt;p&gt;#!/bin/bash&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;$JAVA_HOME/bin/rmiregistry 2020 &amp;amp;&lt;/p&gt;&lt;p&gt;$JAVA_HOME/bin/jstatd -J-Djava.security.policy=jstatd.all.policy&amp;nbsp;-p 2020 &amp;amp;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;그런데 정보가 안나온다? 왜?&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;stackoverflow에 관련 자료를 검색해본다.&lt;/p&gt;&lt;p&gt;버그인거 같단다.&lt;/p&gt;&lt;p&gt;다음과 같이 바꿔주었더니 정상동작헀다.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;~/jstatd_run.sh&lt;/p&gt;&lt;div class=&quot;txc-textbox&quot; style=&quot;border: 3px double rgb(203, 203, 203); padding: 10px;&quot;&gt;&lt;p&gt;#!/bin/bash&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;$JAVA_HOME/bin/rmiregistry 2020 &amp;amp;&lt;/p&gt;&lt;p&gt;$JAVA_HOME/bin/jstatd -J-Djava.security.policy=jstatd.all.policy&amp;nbsp;&lt;span style=&quot;color: rgb(255, 0, 0);&quot;&gt;&lt;b&gt;-J-Djava.rmi.server.hostname=192.168.0.42&lt;/b&gt;&lt;/span&gt; -p 2020 &amp;amp;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;192.168.0.42는 저 서버의 IP 주소이다. &lt;strike&gt;저게 없어서 동작안했다는게 말이되나?&lt;/strike&gt;&lt;/p&gt;&lt;p&gt;&lt;strike&gt;&lt;br /&gt;&lt;/strike&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;</description>
      <category>작업</category>
      <author>forcemax</author>
      <guid isPermaLink="true">https://forcemax.tistory.com/91</guid>
      <comments>https://forcemax.tistory.com/91#entry91comment</comments>
      <pubDate>Mon, 30 Sep 2013 15:52:24 +0900</pubDate>
    </item>
    <item>
      <title>테이크 LTE 관련 설정</title>
      <link>https://forcemax.tistory.com/90</link>
      <description>&lt;p&gt;&lt;span style=&quot;color: rgb(80, 80, 80); font-family: 굴림; line-height: 20px;&quot;&gt;-&amp;nbsp;WIFI 지연율이 올라가는 경우&amp;nbsp;&lt;/span&gt;&lt;/p&gt;&lt;ol style=&quot;list-style-type: decimal;&quot;&gt;&lt;li&gt;&lt;span style=&quot;color: rgb(80, 80, 80); font-family: 굴림; line-height: 20px;&quot;&gt;1. *#*#80292310#*#*&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style=&quot;color: rgb(80, 80, 80); font-family: 굴림; line-height: 20px;&quot;&gt;system setting 선택&amp;nbsp;&amp;nbsp;wifi 항목선택&lt;/span&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style=&quot;color: rgb(80, 80, 80); font-family: 굴림; line-height: 20px;&quot;&gt;3. radiation testmode&amp;nbsp;on으로 변경&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;출처 :&amp;nbsp;&lt;span style=&quot;color: rgb(80, 80, 80); font-family: 굴림; line-height: 20px;&quot;&gt;&lt;a href=&quot;http://www.ppomppu.co.kr/zboard/view.php?id=phone&amp;amp;no=1329679&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;http://www.ppomppu.co.kr/zboard/view.php?id=phone&amp;amp;no=1329679&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;</description>
      <category>개인</category>
      <author>forcemax</author>
      <guid isPermaLink="true">https://forcemax.tistory.com/90</guid>
      <comments>https://forcemax.tistory.com/90#entry90comment</comments>
      <pubDate>Wed, 12 Dec 2012 21:39:34 +0900</pubDate>
    </item>
    <item>
      <title>Social Curation Service</title>
      <link>https://forcemax.tistory.com/89</link>
      <description>&lt;p&gt;&lt;b&gt;Social Curation Service (Crowd curating)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;관련 블로그 :&amp;nbsp;&lt;a href=&quot;http://trendinsight.biz/archives/37990&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;http://trendinsight.biz/archives/37990&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;대표적 사이트 &lt;span style=&quot;color: rgb(255, 0, 0); &quot;&gt;Pinterest&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;a href=&quot;http://www.pinterest.com/&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;http://www.pinterest.com/&lt;/a&gt;&lt;/p&gt;&lt;p&gt;pinterest 관련하여 자료를 찾던중 보았던 글 중에서 가장 좋았던 블로그&lt;/p&gt;&lt;p&gt;&lt;a href=&quot;http://mobizen.pe.kr/1156&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;http://mobizen.pe.kr/1156&lt;/a&gt;&lt;/p&gt;&lt;p&gt;핀터레스트 관련 국내 기사 : &lt;a href=&quot;http://www.hani.co.kr/arti/economy/economy_general/525367.html&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;http://www.hani.co.kr/arti/economy/economy_general/525367.html&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;Pinterest의 copy site로 유명한 &lt;span style=&quot;color: rgb(255, 0, 0); &quot;&gt;P&lt;/span&gt;&lt;span style=&quot;color: rgb(255, 0, 0); &quot;&gt;inspire&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;a href=&quot;http://www.pinspire.com/&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;http://www.pinspire.com/&lt;/a&gt; (local site : &lt;a href=&quot;http://www.pinspire.co.kr/&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;http://www.pinspire.co.kr/&lt;/a&gt; )&lt;/p&gt;&lt;p&gt;국내 비지니스 관련 기사 :&amp;nbsp;&lt;a href=&quot;http://sports.chosun.com/news/news.htm?id=201203210100152040012710&amp;amp;ServiceDate=20120321&quot; target=&quot;_blank&quot; class=&quot;tx-link&quot;&gt;http://sports.chosun.com/news/news.htm?id=201203210100152040012710&amp;amp;ServiceDate=20120321&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;</description>
      <category>작업</category>
      <author>forcemax</author>
      <guid isPermaLink="true">https://forcemax.tistory.com/89</guid>
      <comments>https://forcemax.tistory.com/89#entry89comment</comments>
      <pubDate>Tue, 3 Apr 2012 17:56:23 +0900</pubDate>
    </item>
    <item>
      <title>Nexus S 3G가 자주 끊어지는 경우</title>
      <link>https://forcemax.tistory.com/88</link>
      <description>다이얼러를 실행시킨 상태에서&amp;nbsp;&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;*#*#4636#*#*&amp;nbsp;&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
휴대전화정보 -&amp;gt; 기본 네트워크 유형 설정 : (WCDMA only 로 변경)&lt;b&gt;&amp;nbsp;&lt;br /&gt;

&lt;br /&gt;
&lt;br /&gt;PS 1. SMS가 잘 안가는 경우&amp;nbsp;&lt;br /&gt;
&lt;/b&gt;&lt;div style=&quot;display: inline !important; line-height: 20px;&quot;&gt;
&lt;font color=&quot;#404040&quot;&gt;휴대전화정보(phone information) -&amp;gt;&amp;nbsp;&lt;/font&gt;&lt;/div&gt;
&lt;span style=&quot;line-height: 20px; color: rgb(64, 64, 64); &quot;&gt;smsc(short message service center)에 &lt;br /&gt;
07912801929190 (sk는 089128010099012925)를 입력하고 업데이트를 누르시면 됩니다.&lt;br /&gt;
&amp;nbsp;&lt;/span&gt;&lt;div&gt;
&lt;span style=&quot;line-height: 20px; color: rgb(64, 64, 64); &quot;&gt;네트워크 서비스 제공자를 KT가 아닌 SKT로 잡았다가 다시 KT로 잡으면(혹은 반대)&lt;br /&gt;
SMSC 번호가 00으로 초기화되서 벌어지는 현상때문에 문자 전송 오류가 난다고 합니다.&lt;br /&gt;
&lt;br /&gt;출처. 뽐뿌 (&lt;/span&gt;&lt;font color=&quot;#404040&quot;&gt;&lt;span style=&quot;line-height: 20px;&quot;&gt;http://www.ppomppu.co.kr/zboard/view.php?id=android&amp;amp;page=1&amp;amp;sn1=&amp;amp;divpage=13&amp;amp;sn=off&amp;amp;ss=on&amp;amp;sc=off&amp;amp;select_arrange=headnum&amp;amp;desc=asc&amp;amp;no=63343)&lt;/span&gt;&lt;/font&gt;&lt;span style=&quot;color: rgb(64, 64, 64); line-height: 20px; &quot;&gt;&amp;nbsp;&lt;/span&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;b&gt;
PS 2. &lt;/b&gt;OTA가 있는지 확인하는 경우&amp;nbsp;&lt;br /&gt;
&lt;br /&gt;
다이얼러를 실행시킨 상태에서&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;*#*#2432546#*#* &amp;nbsp;&lt;/b&gt;</description>
      <category>개인</category>
      <category>nexus s</category>
      <author>forcemax</author>
      <guid isPermaLink="true">https://forcemax.tistory.com/88</guid>
      <comments>https://forcemax.tistory.com/88#entry88comment</comments>
      <pubDate>Wed, 7 Dec 2011 15:32:56 +0900</pubDate>
    </item>
    <item>
      <title>대규모 웹하드 솔루션으로서의 eFolder 활용</title>
      <link>https://forcemax.tistory.com/87</link>
      <description>&amp;nbsp;저는 2006년부터 (주)Embian에 근무하면서 eFolder라는 솔루션에 대해 큰 관심을 가지고 있었습니다.&lt;br /&gt;
&amp;nbsp;eFolder 솔루션은 &lt;b&gt;대규모 웹하드 서비스&lt;/b&gt;를 위한 솔루션이기도 하지만, &lt;b&gt;중소규모&lt;/b&gt;에서도 사용할 수 있는&lt;b&gt; 가벼운 솔루션&lt;/b&gt;입니다.&amp;nbsp;이러한 eFolder 솔루션이 외부에 알려지지 않은 상황을 안타깝게 생각하여&amp;nbsp;2010년 부터 eFolder 솔루션의 공개를 주장해 왔으며, &lt;u&gt;2011년 3월 드디어 오픈소스로 배포하게 되었습니다.&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp;&amp;nbsp;eFolder 솔루션은 지난 10년간 &lt;b&gt;FolderPlus, 소리바다, DayFolder&lt;/b&gt;등의 서비스에서 사용되었으며, FolderPlus에서는 지난 8년간 서비스에 사용되었습니다.&lt;br /&gt;
&amp;nbsp;이 경험을 바탕으로 '&lt;b&gt;대규모 웹하드 솔루션&lt;/b&gt;'에서 eFolder의 활용 방법을 기록으로 남기기 위해 블로깅을 하기로 결정하였습니다.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp;먼저, FolderPlus가 어떻게 운영되었으며, 어느 정도 규모의 서비스 였는지를 알아보겠습니다. FolderPlus는 2003년 (주)아이서브에서 런칭한 웹하드 서비스 입니다. 2010년 서비스가 중단되었지만, 웹하드 업계 5위의 서비스였습니다. &lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;txc-textbox&quot; style=&quot;border-top-style: double; border-right-style: double; border-bottom-style: double; border-left-style: double; border-top-width: 3px; border-right-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-top-color: rgb(121, 165, 228); border-right-color: rgb(121, 165, 228); border-bottom-color: rgb(121, 165, 228); border-left-color: rgb(121, 165, 228); background-color: rgb(219, 232, 251); padding-top: 10px; padding-right: 10px; padding-bottom: 10px; padding-left: 10px; &quot;&gt;
&lt;p style=&quot;margin-top: 0px; margin-right: 0px; margin-bottom: 0px; &quot;&gt;&lt;/p&gt;
&lt;p style=&quot;margin-top: 0px; margin-right: 0px; margin-bottom: 0px; &quot;&gt;&lt;/p&gt;
&lt;p style=&quot;margin-top: 0px; margin-right: 0px; margin-bottom: 0px; &quot;&gt;&lt;/p&gt;
&lt;p style=&quot;margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 4em; &quot;&gt;&lt;/p&gt;
&lt;p style=&quot;margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 8em; &quot;&gt;&lt;/p&gt;
&lt;p style=&quot;margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 4em; &quot;&gt;&lt;/p&gt;
&lt;p style=&quot;margin-top: 0px; margin-right: 0px; margin-bottom: 0px; &quot;&gt;&lt;/p&gt;
&lt;p style=&quot;margin-top: 0px; margin-right: 0px; margin-bottom: 0px; &quot;&gt;&lt;/p&gt;
&lt;p style=&quot;margin-top: 0px; margin-right: 0px; margin-bottom: 0px; &quot;&gt;&lt;/p&gt;
&lt;p style=&quot;margin-top: 0px; margin-right: 0px; margin-bottom: 0px; &quot;&gt;&lt;/p&gt;
&lt;p style=&quot;margin-top: 0px; margin-right: 0px; margin-bottom: 0px; &quot;&gt;&lt;/p&gt;
&lt;p style=&quot;margin-top: 0px; margin-right: 0px; margin-bottom: 0px; &quot;&gt;&lt;/p&gt;
&lt;p style=&quot;margin-top: 0px; margin-right: 0px; margin-bottom: 0px; &quot;&gt;&lt;/p&gt;
&lt;p style=&quot;margin-top: 0px; margin-right: 0px; margin-bottom: 0px; &quot;&gt;&lt;/p&gt;
&lt;div style=&quot;margin-left: 8em; &quot;&gt;
- eFolder 솔루션으로 운영&lt;br /&gt;
- 총 사용자 : 600 만명&lt;br /&gt;
- 총 저장공간 : 1.5 PB ( 1.5 PB = 1,500 TB = 1,500,000 GB )&lt;br /&gt;
- 총 네트워크 대역폭 : 60 Gb/s&lt;br /&gt;
- 총 사용서버 : 약 500 ea&lt;/div&gt;
&lt;div style=&quot;margin-left: 4em; &quot;&gt;
&lt;/div&gt;
&lt;div style=&quot;margin-left: 12em; &quot;&gt;
- 스토리지 서버 : 약 300 ea&lt;br /&gt;
- 웹 서버 : 약 200 ea&lt;br /&gt;
- 기타 서버 : 약 30 ea&amp;nbsp;&lt;/div&gt;
&lt;p&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div style=&quot;&quot;&gt;
&lt;/div&gt;
&lt;div style=&quot;margin-left: 4em;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;div style=&quot;&quot;&gt;
&lt;div style=&quot;text-align: left;&quot;&gt;
&amp;nbsp;위와 같은 대규모 서비스에서 eFolder 솔루션을 지속적으로 발전시켜 왔으며, 오픈소스로 배포된 eFolder 솔루션은 지난 10년간의 경험이 모두 담겨있습니다.&lt;br /&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div style=&quot;margin-left: 4em;&quot;&gt;
&lt;div style=&quot;text-align: left;&quot;&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div style=&quot;&quot;&gt;
&lt;div style=&quot;text-align: left;&quot;&gt;
&amp;nbsp;이 글은 FolderPlus에서 eFolder 솔루션이 어떠한 모습으로 사용되었는지, eFolder 솔루션외에 어떠한 기술들이 사용되었는지를 정리하는 차원에서 기술하도록 하겠습니다.&lt;br /&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div style=&quot;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;div style=&quot;&quot;&gt;
&amp;nbsp;FolderPlus에서 사용된 주요 기술은 다음과 같습니다.&lt;br /&gt;
&lt;/div&gt;
&lt;div style=&quot;&quot;&gt;
&lt;div class=&quot;txc-textbox&quot; style=&quot;border-top-style: double; border-right-style: double; border-bottom-style: double; border-left-style: double; border-top-width: 3px; border-right-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-top-color: rgb(121, 165, 228); border-right-color: rgb(121, 165, 228); border-bottom-color: rgb(121, 165, 228); border-left-color: rgb(121, 165, 228); background-color: rgb(219, 232, 251); padding-top: 10px; padding-right: 10px; padding-bottom: 10px; padding-left: 10px; &quot;&gt;
&lt;p style=&quot;margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; &quot;&gt;&lt;meta http-equiv=&quot;content-type&quot; content=&quot;text/html; charset=utf-8&quot;&gt;&lt;/p&gt;
&lt;li style=&quot;text-align: left;&quot;&gt;OpenAFS : 대용량 서비스를 위한 FileSystem&amp;nbsp;&lt;br /&gt;
&amp;nbsp;&amp;nbsp; &amp;nbsp;- Uniform namespace 지원 (한 공간을 동시에 Read/Write 가능)&lt;br /&gt;
&amp;nbsp;&amp;nbsp; &amp;nbsp;- Quota 및 NSS 지원&lt;br /&gt;
&lt;br /&gt;
&lt;/li&gt;
&lt;li style=&quot;text-align: left;&quot;&gt;UAS : 600만 사용자 처리를 위한 인증 시스템&lt;br /&gt;
&amp;nbsp;&amp;nbsp; &amp;nbsp;- 자체 개발한 인증 시스템&lt;br /&gt;
&amp;nbsp;&amp;nbsp; &amp;nbsp;- NSS 모듈 제작 &amp;amp; MySQL을 사용하여 데이터 저장&lt;br /&gt;
&lt;br /&gt;
&lt;/li&gt;
&lt;li style=&quot;text-align: left;&quot;&gt;Cache 시스템 : 트래픽 분산을 위한 시스템&lt;br /&gt;
&amp;nbsp;&amp;nbsp; &amp;nbsp;- 웹하드 서비스 특성상 트래픽이 한 곳에 몰리는 현상이 발생&lt;br /&gt;
&amp;nbsp;&amp;nbsp; &amp;nbsp;- 고성능 SSD 스토리지를 사용하여 고속 I/O를 처리하는 시스템 구성&lt;br /&gt;
&amp;nbsp;&amp;nbsp; &amp;nbsp;- 요청 빈도가 높은 파일을 선택하는 알고리즘 구축&lt;br /&gt;
&lt;br /&gt;
&lt;/li&gt;
&lt;li style=&quot;text-align: left;&quot;&gt;eFolder 솔루션 : 고속 파일 전송 및 고성능 Application Server&lt;br /&gt;
&amp;nbsp;&amp;nbsp; &amp;nbsp;- Apache에 mod_perl2를 사용하여 작성&lt;br /&gt;
&amp;nbsp;&amp;nbsp; &amp;nbsp;- I/O 처리에 특화된 CGI를 작성하여 웹하드 솔루션에 특화&lt;br /&gt;
&amp;nbsp;&amp;nbsp; &amp;nbsp;- Server Side 데이터 처리를 바탕으로 높은 데이터 안정성 제공&lt;/li&gt;
&lt;p&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;div style=&quot;&quot;&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;1. OpenAFS (대용량 지원을 위한 FileSystem)&lt;/span&gt;&lt;br /&gt;
http://www.openafs.org/&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp;Carnegie Mellon 대학에서 개발된 AFS(Andrew File System) 에서 부터 시작되었으며, &amp;nbsp;&lt;span style=&quot;font-weight: bold;&quot;&gt;분산 컴퓨팅 환경을 위한 FileSystem&lt;/span&gt;입니다.&amp;nbsp;&lt;br /&gt;
&amp;nbsp;기본적으로 CS(Client-Server) 구조로 구성되며, File Server 를 관장하는 DB Server가 존재합니다. 이 구조를 FolderPlus에서 사용한 방식으로 예를 들어 설명하겠습니다.&lt;br /&gt;
&lt;/div&gt;
&lt;div style=&quot;&quot;&gt;
&lt;br /&gt;
&lt;p style=&quot;margin:0&quot;&gt;&lt;div class=&quot;imageblock center&quot; style=&quot;text-align: center; clear: both;&quot;&gt;&lt;img src=&quot;https://t1.daumcdn.net/cfile/tistory/1712E54B4DBFBDAC18&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Ft1.daumcdn.net%2Fcfile%2Ftistory%2F1712E54B4DBFBDAC18&quot; width=&quot;500&quot; height=&quot;447&quot; alt=&quot;&quot; filename=&quot;folderplus 서버 구성.png&quot; filemime=&quot;image/jpeg&quot;/&gt;&lt;/div&gt;
&lt;/p&gt;
(아쉽게도, 이미지 캡춰를 하면서 마우스 포인터가 들어갔네요;;)&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp;위 이미지는 FolderPlus의 CS(Client-Server) 구조를 매우 간략화한 구조도 입니다. 각 구성요소에 대해서 하나씩 설명하면 다음과 같습니다. (Client는 생략)&lt;br /&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;div style=&quot;margin-left: 4em;&quot;&gt;
1. Upload Server : 파일 업로드에 사용&lt;br /&gt;
&lt;/div&gt;
&lt;div style=&quot;margin-left: 4em;&quot;&gt;
2. Download Server : 파일 다운로드에 사용&lt;br /&gt;
&lt;/div&gt;
&lt;div style=&quot;margin-left: 4em;&quot;&gt;
3. Folder Server : 파일 업로드/다운로드 외의 모든 작업에 사용&lt;br /&gt;
&lt;/div&gt;
&lt;div style=&quot;margin-left: 4em;&quot;&gt;
4. File Server : 실제 파일이 저장&lt;br /&gt;
&lt;/div&gt;
&lt;div style=&quot;margin-left: 4em;&quot;&gt;
5. DB Server : File Server에 대한 정보가 저장&lt;br /&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;div style=&quot;margin-left: 4em;&quot;&gt;
&lt;span style=&quot;text-decoration: underline;&quot;&gt;- Client는 Server 1,2,3으로만 연결되며, Server 4,5로는 연결되지 않습니다.&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;div style=&quot;&quot;&gt;
&amp;nbsp;&lt;span style=&quot;font-weight: bold;&quot;&gt;Upload/Download/Folder Server는 OpenAFS Client&lt;/span&gt;로 동작하며, &lt;span style=&quot;font-weight: bold;&quot;&gt;File/DB Server는 OpenAFS Server&lt;/span&gt;로 동작합니다.&lt;span style=&quot;font-weight: bold;&quot;&gt; 사용자가 연결하는 서버&lt;/span&gt;는 실제 파일이 존재하는 서버가 아니라 &lt;span style=&quot;font-weight: bold;&quot;&gt;파일에 접근할 수 있는 OpenAFS Client&lt;/span&gt;인 것입니다.&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp; 위의 구조가 보기에는 약간 복잡해 보이는데, 이는 OpenAFS가 가진 문제점 때문입니다. OpenAFS Client는&lt;span style=&quot;font-weight: bold;&quot;&gt; kernel level의 driver&lt;/span&gt;이기 때문에 문제가 발생하면 &lt;span style=&quot;font-weight: bold;&quot;&gt;OS가 Kernel panic&lt;/span&gt;을 일으키며 정지하게 됩니다. 이는 전체 서비스에 영향을 줄 수 있기때문에, &lt;span style=&quot;font-weight: bold;&quot;&gt;하나의 서버에 장애가 발생하여도 서비스에 영향을 주지 않도록 하기 위해&lt;/span&gt;서 약간 복잡하기는 하지만 위와 같은 구성을 하게 되었습니다.&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp;OpenAFS가 위와 같은 문제점을 가지고 있긴 하지만, 이는 장점에 비하면 아주 간단한 문제입니다. 위에도 설명했지만 &lt;span style=&quot;font-weight: bold;&quot;&gt;Uniform namespace&lt;/span&gt;를 사용한다는 것, 그리고 &lt;span style=&quot;font-weight: bold;&quot;&gt;Quota&lt;/span&gt; 및 &lt;span style=&quot;font-weight: bold;&quot;&gt;NSS&lt;/span&gt;(Name Service Switch)를 지원한다는 점입니다. &lt;span style=&quot;font-weight: bold;&quot;&gt;대용량 웹하드 서비스를 구성하기 위해서 가장 필요한 요소&lt;/span&gt;입니다.&lt;br /&gt;
&lt;br /&gt;
&lt;/div&gt;&lt;div style=&quot;&quot;&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;1. Uniform namespace : &lt;/span&gt;간단하게 설명하면, &lt;span style=&quot;font-weight: bold;&quot;&gt;모든 OpenAFS Client가 하나의 공간을 공유&lt;/span&gt;한다고 생각하면 됩니다. 동일한 공간을 보고있기 때문에 &lt;span style=&quot;font-weight: bold;&quot;&gt;어느 서버에 요청을 하더라도 동일한 자원에 접근&lt;/span&gt;할 수 있다는 장점이 있습니다.&lt;br /&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;2. Quota 및 NSS 지원 : &lt;/span&gt;Quota는 간단하게 최대 디스크 사용량을 제한하는 기능입니다. POSIX 표준에도 있는 기능이니, 이 기능이 지원되면 &lt;span style=&quot;font-weight: bold;&quot;&gt;사용자별 디스크 사용량 관리가 쉬워&lt;/span&gt;집니다. NSS는 사용자 인증, 즉 사용자별 Home Directory, ID, Password등의 인증 및 사용자 정보를 확인하고 받아올 수 있는 기능입니다. 이 역시 대부분의 UNIX 시스템에서 사용되고 있기 때문에, NSS가 지원되면 &lt;span style=&quot;font-weight: bold;&quot;&gt;적은 노력으로 여러 시스템에서 운영할 수 서비스를 구축&lt;/span&gt;할 수 있습니다.&lt;br /&gt;
&lt;/div&gt;&lt;div style=&quot;&quot;&gt;
&lt;br /&gt;
&amp;nbsp;위 이미지에서는 간략화되어 있지만, 실제 FolderPlus에서는 다음의 규모로 운영되었습니다.&lt;br /&gt;
&lt;/div&gt;
&lt;div style=&quot;margin-left: 4em;&quot;&gt;
- Upload Server : 약 20 ea&lt;br /&gt;
&lt;/div&gt;
&lt;div style=&quot;margin-left: 4em;&quot;&gt;
- Download Server : 약 100 ea&lt;br /&gt;
&lt;/div&gt;
&lt;div style=&quot;margin-left: 4em;&quot;&gt;
- Folder Server : 약 50 ea&lt;br /&gt;
&lt;/div&gt;
&lt;div style=&quot;margin-left: 4em;&quot;&gt;
- File Server : 약 300 ea&lt;br /&gt;
&lt;/div&gt;
&lt;div style=&quot;margin-left: 4em;&quot;&gt;
- DB Server : 약 10 ea&lt;br /&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;div style=&quot;&quot;&gt;
&amp;nbsp;
하나의 C Class 대역에서 사용할 수 있는 서버의 대수를 넘어서기 때문에, 3개의 C Class 대역을 사용하였으며, 각 C Class 대역에 각 Server들이 고루 분배되어서 운영되었습니다.&lt;br /&gt;
&lt;/div&gt;
&lt;div style=&quot;&quot;&gt;
&lt;br /&gt;
&amp;nbsp;OpenAFS에 대해서 더 알고 싶으신 분은 다음 링크를 참고하세요.&lt;br /&gt;
&lt;a title=&quot;[http://www.ibm.com/developerworks/kr/library/os-openafs/index.html]로 이동합니다.&quot; target=&quot;_blank&quot; href=&quot;http://www.ibm.com/developerworks/kr/library/os-openafs/index.html&quot;&gt;http://www.ibm.com/developerworks/kr/library/os-openafs/index.html&lt;/a&gt;&lt;br /&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;2. UAS (자체 개발한 인증 시스템)&lt;br /&gt;
&lt;/span&gt;http://www.embian.com/Embian_eAccount.phtml&lt;br /&gt;
&lt;br /&gt;&amp;nbsp;UAS는 (주)엠비안에서 자체 제작한 인증 시스템입니다. 사용자 인증을 가장 빠르게 처리하기 위해서 개발되었으며, FolderPlus에서는 600만명의 사용자를 대상으로 사용되었던 경험이 있습니다. 위 링크는 UAS에 대한 설명은 아니지만, UAS의 전신인 Embian eAccount에 대한 설명이 있는 링크이니, 참고하시면 좋을 것이라 생각됩니다.&lt;br /&gt;
&lt;br /&gt;&amp;nbsp;UAS의 가장 큰 장점은 NSS를 사용한다는 것입니다. 위 OpenAFS에서도 설명되었지만 NSS를 지원하면 적은 노력으로 여러 시스템에서 운영할 수 있습니다. 즉, 플랫폼 독립적인 서비스를 구축할 수 있다는 것입니다.&lt;br /&gt;
&amp;nbsp;더불어 과거 (주)엠비안 기술연구소 소장으로 재직하였던, 전성진님의 기술이 녹아들어가서 상당히 빠른 응답을 동작속도가 보장됩니다. &lt;br /&gt;
&lt;br /&gt;&lt;br /&gt;
&lt;br /&gt;&lt;br /&gt;
&lt;br /&gt;&lt;br /&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;== 작성중입니다. ==&lt;/span&gt;&lt;br style=&quot;font-weight: bold;&quot;&gt;&lt;span style=&quot;font-weight: bold;&quot;&gt;
== 문의사항 알려주시면 해당 부분에 대해서도 글에 포함하도록 하겠습니다. ==&lt;/span&gt;&lt;br /&gt;
&lt;/div&gt;</description>
      <category>작업</category>
      <category>eFolder</category>
      <category>Folderplus</category>
      <author>forcemax</author>
      <guid isPermaLink="true">https://forcemax.tistory.com/87</guid>
      <comments>https://forcemax.tistory.com/87#entry87comment</comments>
      <pubDate>Wed, 13 Apr 2011 15:13:17 +0900</pubDate>
    </item>
  </channel>
</rss>