Predicting compute required to run Spark jobs
Im working on a project to predict compute (cores) required to run spark jobs. Has anyone work on this or something similar before? How did you get started?
- 16 Views
- 0 replies
- 0 kudos
Im working on a project to predict compute (cores) required to run spark jobs. Has anyone work on this or something similar before? How did you get started?
Hi, While ingesting files from a source folder continuously, I would like to be able to detect the case where files are being deleted. As far as I can tell the Autoloader can not handle the detection of files deleted in the source folder. Hence the c...
Hello, so today I watched the tutorial videos and passed the knowledge test as requested to earn the "Databricks Accredited Lakehouse Fundamentals" Badge. Instead I received the "Certificate of Completion of Fundamentals of the Databricks Lakehouse P...
Hi All,We have a table which has an id column generated by uuid(). For ETL we use databricks/spark sql temporary views. we observed strange behavior between databricks sql temp view (create or replace temporary view) and spark sql temp view (df.creat...
In databrick, where is hive metastore location is it control plane or data plane? for prod systems In terms of security what preventions should be taken to secure hive metastore?
@as999​ The default metastore is managed by Databricks. If you are concerned about security and would like to have your own metastore you can go for the external metastore setup. You have the details steps in the below doc for setting up the external...
Hello,when developing locally using Databricks connect how do I re-establish the SparkSession when the Cluster restarted? getOrCreate() seems to get the old invalid SparkSession even after Cluster restart instead of creating a new one or am I missing...
If anyone encounters this problem, the solution that worked for me was to restart the Jupyter kernel.
Hello,When using /api/2.0/preview/sql/queries to list out all available queries, I noticed that certain queries were being shown while others were not. I did a small test on my home workspace, and it was able to recognize certain queries when I defin...
Hi,How many queries were returned in the API call in question? The List Queries documentation describes this endpoint as supporting pagination with a default page size of 25, is that how many you saw returned? Query parameters page_size integer <= 10...
I'm tring to build a ETL pipeline in which I'm reading the jsonl files from the azure blob storage, then trying to transform and load it to delta tables in databricks. I have created the below schema for loading my data : schema = StructType([ S...
Try this.Add option("multiline","true")
I'm trying to connect to oracle server hosted in azure from AWS databricks notebook but seems the connection keeps timing out. I tested the connection IP using telnet <hostIP> 1521 command from another EC2 instance and that seems to reach the oracle ...
Hi,I have implemented the UNITY CATALOG with multinode cluster in databricks. The workspace instance profile with EC2 access is also created in IAM. but still having a challenge in sending emails from databricks using SES service.The same is working ...
Hi,I'm trying to set a dynamic value to use in a DLT query, and the code from the example documentation does not work.SET startDate='2020-01-01'; CREATE OR REFRESH LIVE TABLE filtered AS SELECT * FROM my_table WHERE created_at > ${startDate};It is g...
Hi @MarkD ,You may use set variable_name.var= '1900-01-01'to set the value of variable and in order to use the value of variable use ${automated_date.var} Example: set automated_date.var= '1800-01-01' select * from my table where date = CAST(${autom...
Hi,I am using CLI to transfer local files to Databricks Volume. At the end of my upload, I want to create a meta table (storing file name, location, and some other information) and have it as a table on databricks Volume. I am not sure how to create ...
Hi @pshuk , Greetings! We understand that you are looking for a CLI command to create a Table but at this moment Databricks doesn't support CLI command to create the table but you can use SQL Execution API -https://docs.databricks.com/api/workspace/...
Hi All,I am trying to create an external table from a Azure Blob storage container. I recieve no errors, but there is no data in the table. The Blob Storage contains 4 csv files with the same columns and about 10k rows of data. Am I missing someth...
Hi, The code looks completely fine. please check if you have any other delimiter other than , .If your CSV files use a different delimiter, you can specify it in the table definition using the OPTIONS clause.Just to confirm I created a sample table a...
Hi data experts.I currently have an OLTP (Azure SQL DB) that keeps data only for the past 14 days. We use Partition switching to achieve that and have an ETL (Azure data factory) process that feeds the Datawarehouse (Azure Synapse Analytics). My requ...
Hi @Kaniz I have looked at this topic extensively and have even tried to implement it.I am a champion of databricks at my organization, but I do not think that it currently enables the OLTP scenarios.The closest I have gotten to it is by using the St...
I am not able to run our unit tests suite due a possible bug in the databricks-connect library. The problem is with the Dataframe transformation withColumnRenamed. When I run it in a Databricks cluster (Databricks Runtime 14.3 LTS), the column is ren...
@dbal - can you please try withColumnsRenamed() instead Reference: https://docs.databricks.com/en/release-notes/dbconnect/index.html#databricks-connect-1430-python
User | Count |
---|---|
1604 | |
737 | |
344 | |
284 | |
247 |