-
Notifications
You must be signed in to change notification settings - Fork 299
Open
Description
I have problems running the following commands in python:
import ember
ember.create_vectorized_features("/data/ember2018/")
I have installed the dependencies and tried on docker with leif versions 0.9.0, 0.10.1 and i still get the same failure:
ember.create_vectorized_features("./ember/")
Vectorizing training set
0%| | 0/900000 [00:00<?, ?it/s]
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/__init__.py", line 44, in vectorize_unpack
return vectorize(*args)
File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/__init__.py", line 31, in vectorize
feature_vector = extractor.process_raw_features(raw_features)
File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/features.py", line 552, in process_raw_features
feature_vectors = [fe.process_raw_features(raw_obj[fe.name]) for fe in self.features]
File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/features.py", line 552, in <listcomp>
feature_vectors = [fe.process_raw_features(raw_obj[fe.name]) for fe in self.features]
File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/features.py", line 192, in process_raw_features
entry_name_hashed = FeatureHasher(50, input_type="string").transform([raw_obj['entry']]).toarray()[0]
File "/opt/conda/lib/python3.8/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/sklearn/feature_extraction/_hash.py", line 170, in transform
raise ValueError(
ValueError: Samples can not be a single string. The input must be an iterable over iterables of strings.
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/__init__.py", line 75, in create_vectorized_features
File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/__init__.py", line 60, in vectorize_subset
File "/opt/conda/lib/python3.8/site-packages/tqdm/std.py", line 1195, in __iter__
for obj in iterable:
File "/opt/conda/lib/python3.8/multiprocessing/pool.py", line 868, in next
raise value
ValueError: Samples can not be a single string. The input must be an iterable over iterables of strings.
>>>
I seems from the error msg, that the input is not the same format as expected in the vectorizor?
Any fix to this?
jensbirk, keremgirenes and itslucky333
Metadata
Metadata
Assignees
Labels
No labels