Skip to content

Commit 45e7ce9

Browse files
committed
Fix molsa cfb scraping
1 parent 9855e71 commit 45e7ce9

File tree

2 files changed

+23
-11
lines changed

2 files changed

+23
-11
lines changed

datapackage_pipelines_budgetkey/pipelines/procurement/calls_for_bids/molsa.py

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
from hashlib import md5
2-
from dataflows import Flow, printer
2+
from dataflows import Flow, printer, update_resource
3+
from datapackage_pipelines.utilities.resources import PROP_STREAMING
34
import requests
45
from pyquery import PyQuery as pq
56

@@ -115,9 +116,15 @@ def flow(*args):
115116
resolve_ordering_unit(),
116117
fix_documents(),
117118
calculate_publication_id(),
119+
update_resource(
120+
-1, name='molsa',
121+
**{
122+
PROP_STREAMING: True
123+
}
124+
),
118125
printer()
119126
)
120127

121128

122-
if __name__ == '__main__':
123-
flow().process()
129+
# if __name__ == '__main__':
130+
# flow().process()

datapackage_pipelines_budgetkey/pipelines/procurement/calls_for_bids/pipeline-spec.yaml

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
1-
scraper-exemptions:
1+
calls-for-bids:
22
schedule:
33
crontab: "0 16 * * *"
44

55
pipeline:
66
- run: add_metadata
77
parameters:
88
name: calls-for-bids
9-
# get the main HTML page of the exemptions search
109
- flow: molsa
1110
runner: tzabar
11+
- run: sample
1212
- run: concatenate
1313
parameters:
1414
fields:
@@ -34,6 +34,9 @@ scraper-exemptions:
3434
partners: []
3535

3636
documents: []
37+
target:
38+
name: calls_for_bids
39+
path: calls_for_bids.csv
3740
- run: set_types
3841
parameters:
3942
types:
@@ -45,34 +48,36 @@ scraper-exemptions:
4548

4649
start_date:
4750
type: date
48-
format: '%Y/%m/%d'
51+
format: '%d/%m/%Y'
4952
claim_date:
5053
type: date
51-
format: '%Y/%m/%d'
54+
format: '%d/%m/%Y'
5255

5356
required_documents:
5457
type: array
5558
es:itemType: string
5659

5760
documents:
61+
type: array
5862
es:itemType: object
5963
es:schema:
6064
fields:
6165
- {name: link, type: string}
6266
- {name: description, type: string}
6367
- {name: update_time, type: string}
6468

69+
6570
- run: set_primary_key
6671
parameters:
67-
calls_to_bids:
72+
calls_for_bids:
6873
- publication_id
6974
- run: dump_to_path
7075
parameters:
71-
out-path: /var/datapackages/procurement/calls_to_bids
76+
out-path: /var/datapackages/procurement/calls_for_bids
7277
- run: dump_to_sql
7378
parameters:
7479
tables:
75-
calls_to_bids:
76-
resource-name: calls_to_bids
80+
calls_for_bids:
81+
resource-name: calls_for_bids
7782
mode: update
7883

0 commit comments

Comments
 (0)