- 1762 Views
- 1 replies
- 1 kudos
When I try to save my file I getorg.apache.spark.sql.AnalysisException: Text data source supports only a single column, and you have 2 columns.; Is there any way to save a dataframe with more than one column to a .txt file?
- 1762 Views
- 1 replies
- 1 kudos
Latest Reply
Would pyspark.sql.DataFrameWriter.csv work? You could specify the separator (sep) as tabdf.write.csv(os.path.join(tempfile.mkdtemp(), 'data'))
- 757 Views
- 2 replies
- 0 kudos
I'm wondering if there's a way to set a monthly budget and have my workloads stop running if I hit it.
- 757 Views
- 2 replies
- 0 kudos
Latest Reply
Cluster Policies would help with this not only from a cost management perspective but also standardization of resources across the organization as well simplification for a better user experience. You can find Best Practices on leveraging cluster pol...
1 More Replies
- 957 Views
- 1 replies
- 0 kudos
If I save a dataframe without specifying a location, where will it end up?
- 957 Views
- 1 replies
- 0 kudos
Latest Reply
You cant save a dataframe without specifying a location. If you are using saveAsTable API then the table will be created in the hive warehouse location. The default location is user.hive.warehouse
- 676 Views
- 1 replies
- 0 kudos
It seems like with both techniques I would end up with a copy of my table. Trying to understand when I should be using a deep clone.
- 676 Views
- 1 replies
- 0 kudos
Latest Reply
A deep clone is recommended way as it holds the history of the table. Also, the DEEP clone is faster than the read-write approach.
- 782 Views
- 1 replies
- 0 kudos
I have a table that I need to be continuously streaming into. I know it's best practice to run Optimize on my tables periodically. But if I never stop writing to the table, how and when can I run OPTIMIZE against it?
- 782 Views
- 1 replies
- 0 kudos
Latest Reply
If the streaming job is making bling appends to the delta table, then it's perfectly fine to run OPTIMIZE query in parallel.However, if the streaming job is performing MERGE or UPDATE then it can conflict with the OPTIMIZE operations. In such cases w...
- 950 Views
- 1 replies
- 0 kudos
if there is permission control on the folder/file level in DBFS.e.g. if a team member uploads a file to /Filestore/Tables/TestData/testfile, could we mask permissions on TestData and/or testfile?
- 950 Views
- 1 replies
- 0 kudos
Latest Reply
DBFS does not have ACL at this point
- 462 Views
- 1 replies
- 0 kudos
I do not see any best practice guide for the DStream application in Databricks docs. Any reference
- 462 Views
- 1 replies
- 0 kudos
Latest Reply
Dstream is unsupported by Databricks. Databrcks strongly recommend migrating the Dstream applications to use Structured Streaminghttps://kb.databricks.com/streaming/dstream-not-supported.html
- 531 Views
- 1 replies
- 1 kudos
I have a daily OPTIMIZE job running, however, the number of files in the storage is not going down. Looks like the optimize is not helping to reduce the files.
- 531 Views
- 1 replies
- 1 kudos
Latest Reply
The files are not physically removed from the Storage by the optimize command. A VACUUM command has to be executed to achieve the same
- 14586 Views
- 1 replies
- 0 kudos
I started working on databricks. I need to migrate few streaming jobs from Ambari to Databricks. I deployed one job using jar and it. is working fine. But when I deploy the second job I faced an error " multiple spark streaming context not allowed". ...
- 14586 Views
- 1 replies
- 0 kudos
Latest Reply
You can run multiple streaming applications in databricks clusters. By default, this would run in the same fair scheduling pool. To enable multiple streaming queries to execute jobs concurrently and to share the cluster efficiently, you can set the q...