I am using the databricks jdbc driver to access a delta lake. The database URL specifies transportMode=http. I have experimented with setting different values of fetchSize on the java.sqlPreparedStatement object and have monitored memory use within m...
I think there is one spark configuration but I forgot right now Pelase try to utilized this doc maybe you get something- https://spark.apache.org/docs/latest/configuration.html
I have several functions accessing the same createorreplacetempview("viewname"). Does this cause any issues with multiple functions accessing it in a distributed environment?def get_data_sql(spark_session, data_frame, data_element):
data_fram...
there is two type of viewsone is global view - it will be available for whole cluster and notebook but it will removed after cluster restartand another is Temp view- that will be available for only notebook level, and other notebook will not able to ...
With the recommended autoscaling, e.g, https://docs.databricks.com/clusters/cluster-config-best-practices.html, setting; is it possible to dynamically set a fine tuned spark job, given that the number of executors could be changing at any time?
I have a scenario where I need to read a pdf file from "Azure Datalake blob storage to Databricks", where connection is done through AD access.Generating the SAS token has been restricted in our environment due to security issues. The below script ca...
I have a visualization in which the X-axis values are displayed correctly in the Query Editor, in the order produced by the SQL query. However, when I add the visualization to a dashboard, the values are suddenly not sorted anymore.How is this possib...
We have further analyzed the visualization problem and found two solutions.The original visualization consists of 1 series and has aggregation enabled in the UI (but is unused, since the query itself aggregates already).We found that the following tw...
Here is the current output for my select statement. I would like it to return one row for this jobsubmissionid, where it selects only the non-zero value from each of the rows. I tried using SELECT DISTINCT jobsubmissionidbut it still returned 5 rows.
Is that the complete query you are using. I'm guessing that you are using select distinct * from table_name. If you wanted a individual column distinct value you have to apply a filter condition or aggregate the data accordingly. Anyways, a complete ...
If I am new to Databricks and is aiming to get qualification some point Dec2022 or Jan 2023, should I be studying the material Data Engineering with Databricks V2 or V3?
I would suggest to go for V3 because the course Data Engineering with Databricks (V3) is the latest version as of now and was released on 14th October 2022. So, this version would have more topics in comparison to V2.
No, as of now there is no practice exam available for this certification but a good way to get an idea about the exam would be appearing for it once. There are multiple trainings going on from Databricks, attending which you can get the voucher code ...
PicklingError: Could not serialize object: Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers.I...
Assume that I have a Spark DataFrame, and I want to see if records satisfy a condition.Example dataset:# Prepare Data
data = [('A', 1), \
('A', 2), \
('B', 3)
]
# Create DataFrame
columns= ['col_1', 'col_2']
df = spark.createDataF...
I have tried reading image and video data in Azure databricks using OpenCv. When I have checked the type of image, it’s shown as “NonType” and when I tried with vedio file, the file itself was not being opened. (Note: these files are stored on azure ...
Hi,I'm working on Azure Databricks and I created two jobs, one based on a python wheel and the other based on a notebook, with the same code. The code get data from Azure blob storage, process data with pyspark and send data to EventHub. The whole co...
I have attempted the exam and also got passed but I have not received the badge and certificate. I have also raised the request but I have not got any response yet. It is urgently required. I request the databrick team to provide me with the same as ...
Hi @Garvita Kumari Just a friendly follow-up. Are you able to get your certification? If yes, then mark the answer as best or if you need further assistance kindly let me know.Thanks and Regards
We have assigned 3 dedicated subnets (one per AZ ) to the Databricks workspace each with /24 CIDR but noticed that all the jobs are running into a single subnet which causes AWS_INSUFFICIENT_FREE_ADDRESSES_IN_SUBNET_FAILURE.Is there a way to segregat...
@karthik p Have configured one subnet per AZ(total 3). Have followed the same steps as mentioned in the document. Is there a way to check whether the Databricks uses all the subnets or not?@Debayan Mukherjee am not getting how to use LB in this set...
Hello everyone,I have several notebooks (around 10) and I want to run them in a sequential order. At first I thought of using %run but I have a variable that is repeatedly used in every notebook. So now I am thinking to pass that variable from one ma...
Hi @pavan venkata Yes, as the document says 0 means no timeout. It means that the notebook will take it's sweet time to complete execution without throwing an error due to a time limit. Be it if the notebook takes 1 min or 1 hour or 1 day or more. H...