...
- jobs ran 3 to 4 times faster when we copied cfcache from /staging to local disk. I ran a small data set test with full parameters at CHTC that copied cfcache from /staging to local disk and step05 took only 16.7 hours instead of the 56.8 hours it had taken using cfcache on /staging.
- I had a job killed because it exceeded 72 hours even though I set +LongJobs = true in the submit file
- 2385.0 krowe 9/22 20:43 Error from slot1_1@e2008.chtc.wisc.edu: Job failed to complete in 72 hrs
- What are the clever solutions to submitting N different DAG jobs with each having different parmeters?
- T10t34
- J220200-003000
- bin, working, data
- J220600-003000
- bin, working, data
- ...
- J220200-003000
- T10t35
- J170743-393000
- bin, working, data
- J171241-383000
- bin, working, data
- ...
- J170743-393000
- ANSWERS:
- INCLUDE syntax for DAGs
- include syntax for submit files
- make a template of files
- use a PRE script that populates things
- usedagdir
- T10t34
- How can we set AWS Tags with condor_annex? We'd like this to track jobs and set billing tags.
- Launch Templates didn't work. I don't think condor_annex supports Launch Templates.
- Use aws-user-data options to condor_annex?
- I have tried all sorts of user-data and default-user-data-file options. On-demand apparently no longer works and I was never able to get something working with spot-fleet. I think all things user-data are non-functional.
- I tried setting a tag in the role defined in config.json (aws-ec2-spot-fleet-tagging-role) but that tag didn't translate to the instance.
- I tried adding a tag to the AMI when creating a new AMI (EC2 → Instances → Actions → Image → Create Image). Didn't work.
- What about selftagging? The instance figures out its instance id and runs aws.
- wget -qO- http://instance-data/latest/meta-data/instance-id
- returns nothing when logged in as nobody (condor_ssh_to_job)
- returns nothing when logged in as centos (ssh -i ~/.ssh/...)
- returns instanceid when logged in as root (ssh as centos then sudo su)
- Aha! There is a firewall (iptables) rule blocking exactly this. But I can't figure out what file sets this iptables rule on boot.
- wget -qO- http://instance-data/latest/meta-data/instance-id
- I tried adding tags to the json file using both ResourceType set to instance and spot-fleet-request. Neither created an instance with my tag.
...