I have a spark submit job which is running one python file called main.py.The other file is alert.py which is being imported in main.py.Also main.py is using multiple config files.Alert.py is passed in --py-files and other config files are passed as ...
I want to create an external table from more than a single path. I have configured my storage creds and added an external location, and I can successfully create a table using the following code;create table test.base.Example
using csv
options (
h...
Hi @TimB, you can import data from multiple paths using wildcards or similar patterns when creating an external table in Databricks.
To import data from multiple paths using wildcards, you can modify the location parameter in the CREATE TABLE stateme...
ConcurrentAppendException requires a good partitioning strategy, here my logic works without fault for "whenMatchedUpdate" and "whenNotMatchedInsert" logic. When using "whenNotMatchedBySourceUpdate" however it seems that the condition doesn't isolate...
Hi @marcuskw, Based on the provided information and the given code snippet, it seems that the condition in the whenNotMatchedBySourceUpdate The clause does not isolate the specific partition in the Delta table.
This can lead to a ConcurrentAppendExc...
hi @Debayan in the https://learn.microsoft.com/en-us/azure/architecture/databricks-monitoring/application-logs. there is a github repository mentioned https://github.com/mspnp/spark-monitoring ? That repository is marked as maintainance mode. Just...
Vectorized Pandas UDFs offer improved performance compared to standard PySpark UDFs by leveraging the power of Pandas and operating on entire columns of data at once, rather than row by row.They provide a more intuitive and familiar programming inter...
Exciting news for students! Enjoy special discounts on certifications. If you need more study resources check out Chegg Study alternatives on https://edureviewer.com/sites-like-chegg-study/ for extra support in the middle of your academic journey. It...
If you have the following Apache Pig FILTER statement:XCOCD_ACT_Y = FILTER XCOCD BY act_ind == 'Y';the equivalent code in Apache Spark is:XCOCD_ACT_Y_DF = (XCOCD_DF
.filter(col("act_ind") == "Y"))
Translating an Apache Pig FILTER statement to Spark requires understanding the differences in syntax and functionality between the two processing frameworks. While both aim to filter data, Spark uses a different syntax and approach, typically involvi...
Hello!I would like to work with delta tables outside of Databricks UI notebook. I know that the best option would be to use databricks-connect but I don’t have Unity Catalog enabled.What would be the most effective way to do so? I know that via JDBC ...
Hi @narvinya,
• Delta tables can be accessed outside of Databricks UI notebook without using Databricks-connect or Unity Catalog.
• Three options are available for working with Delta tables outside of Databricks UI notebook:
1. Using JDBC: Read and...
Getting this error in Databricks and don't know how to solveOSError: [Errno 7] Argument list too long: '/dbfs/databricks/aaecz/dev/w000aaecz/etl-framework-adb/0.4.31-20230503.131701-1/etl_libraries/utils/datadog/restart_datadog.sh'if anyone can help
@MUA
Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.
I am trying to troubleshoot why spill occurred during DeltaOptimizeWrite. I am running a 64-core cluster with 256 GB RAM, which I expect to be handle this amount data (see attached DAG).
You can resolver the Spill to memory by increasing the shuffle partitions, but 16 GB of spill memory should not create a major impact of your job execution. Could you share more details on the actual source code that you are running?
Got below failure on scheduled job on interactive cluster and the next scheduled run executed fine.I want to know why this error occurred and how can I prevent it to happen again.And how to debug these errors in future ? com.databricks.backend.commo...
@JKR Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.
Hello, I have a git repository on Databricks with notebooks that are meant to be shared with other users. The reason these notebooks are in git as opposed to the "shared" workspace already is because they are to be continuously improved and need sepa...
Hello,
Thanks for contacting Databricks Support.
I presume you're looking to transfer files from external repositories to Databricks workspace. I'm afraid currently there is no direct support on it.
You may consider to use REST API which allows for...
Hi We are experiencing instability when executing queries using Databricks Connect.Sometimes we are unable to receive the full result set without encountering an error with the message "Driver is up but is not responsive..."When we run the same query...
Hi @Bagger, -
Error message: "Driver is up but is not responsive..."
- Potential causes: - Unreachable Cluster - Check workspace instance name and cluster ID - Verify environment variables on the local development machine - Python Ve...
Dear Community, Hope you are doing well.For the last couple of days I am seeing very strange issues with my DLT pipeline, So every 60-70 mins it is getting failed in continuous mode, with the ERROR; INTERNAL_ERROR: Communication lost with driver. Clu...
Hello @Debayan , I am facing same issue, while running Delta live table, This job is running in produtcuion, but it's not working in dev, i have tried to increae the worker nodes but no use. Can you please help on this.
Question about spark checkpoints and offsets in a running streamWhen the stream started I needed tons of partitions, so we've set it with spark.conf to 5000As expected offsets in the checkpoint contain this info and the job used this value. Then we'...
@Jose Gonzalez​ thanks for that information! This is super useful. I was struggling why my streaming still using 200 partitions. This is quite a paint for me because changing checkpoint will insert all data from the source. Do you know where this can...