Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker jobs getting lost since they cannot download storm conf file #56

Closed
echinthaka opened this issue Sep 1, 2015 · 3 comments
Closed

Comments

@echinthaka
Copy link

I launched my first topology and seems like the worker processes can not come up since they can not download the conf file. Here is what I see in the logs.

I0901 00:16:35.558715 19563 fetcher.cpp:214] Fetching URI 'http://hdfs-nn.myhost.com:50070/webhdfs/v1/binaries/storm-mesos/storm-mesos-0.9.3.tgz?op=OPEN'
I0901 00:16:35.558866 19563 fetcher.cpp:125] Fetching URI 'http://hdfs-nn.myhost.com:50070/webhdfs/v1/binaries/storm-mesos/storm-mesos-0.9.3.tgz?op=OPEN' with os::net
I0901 00:16:35.558888 19563 fetcher.cpp:135] Downloading 'http://hdfs-nn.myhost.com:50070/webhdfs/v1/binaries/storm-mesos/storm-mesos-0.9.3.tgz?op=OPEN' to '/mesos/workLogs/slaves/20150730-232738-4076896266-5050-228496-S3/frameworks/20150825-011215-4076896266-5050-231386-0000/executors/SchemaChangeNotificationTopology-1-1441066588/runs/331c3f07-464a-4767-bdb2-4360fe9ce4d6/storm-mesos-0.9.3.tgz?op=OPEN'
I0901 00:16:36.660876 19563 fetcher.cpp:214] Fetching URI 'http://nimbus.myhost.com:39163/conf/storm.yaml'
I0901 00:16:36.660948 19563 fetcher.cpp:125] Fetching URI 'http://nimbus.myhost.com:39163/conf/storm.yaml' with os::net
I0901 00:16:36.660975 19563 fetcher.cpp:135] Downloading 'http://nimbus.myhost.com:39163/conf/storm.yaml' to '/mesos/workLogs/slaves/20150730-232738-4076896266-5050-228496-S3/frameworks/20150825-011215-4076896266-5050-231386-0000/executors/SchemaChangeNotificationTopology-1-1441066588/runs/331c3f07-464a-4767-bdb2-4360fe9ce4d6/storm.yaml'
E0901 00:16:36.726596 19563 fetcher.cpp:141] Error downloading resource, received HTTP/FTP return code 404
Failed to fetch: http://nimbus.myhost.com:39163/conf/storm.yaml
Failed to synchronize with slave (it's probably exited)

The command I use to deploy: ./storm jar ~/Desktop/mytopology.jar com.chinthaka.org.MyAwesomeTopology

~/.storm/storm.yaml file

storm.zookeeper.servers:
- zk.myhost.com
storm.zookeeper.port: 2181
nimbus.host: nimbus.myhost.com

May be this is minor but the worker process is marked as LOST instead of KILLED or FAILED in mesos UI.

@echinthaka
Copy link
Author

found the reason for this issue. Seems like when it is setting up a local file server to serv conf files to workers, it passed in the serving directory simply as conf. This might be troublesome if the startup scripts are being called from else where.

protected void setupHttpServer() throws Exception {
      _httpServer = new LocalFileServer();
      _configUrl = _httpServer.serveDir("/conf", "conf", _localFileServerPort);

      LOG.info("Started HTTP server from which config for the MesosSupervisor's may be fetched. URL: " + _configUrl);
  }

The fix here would be to have a configuration parameter to point to the correct conf directory path in these situations. I fixed this locally and will submit a PR later.

@erikdw
Copy link
Collaborator

erikdw commented Sep 3, 2015

@echinthaka has sent a PR with a proposed fix: #57

@erikdw erikdw changed the title Worker jobs getting lost since they can not download storm conf file Worker jobs getting lost since they cannot download storm conf file Nov 24, 2015
@erikdw
Copy link
Collaborator

erikdw commented Nov 24, 2015

#57 has 2 separate changes embedded in it, at least in its initial incarnation:

  • the 1st change (intended to fix this "serving config file" issue (Worker jobs getting lost since they cannot download storm conf file #56)), but that never got merged because of the 2nd change:
  • the 2nd change is more impactful, using wget to download the executor tarball. This change is more involved and more questionable, and shouldn't have been clubbed together with the 1st change. That issue is still outstanding and deserves its own full-fledged GitHub "issue" in this project.

Notably, the "config file serving" problem is supposedly fixed by #65 per @brndnmtthws comment in #56:

This might no longer be needed since merging #65. From that PR, the config is generated at runtime and then served up to the supervisors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants